Coupled climate simulations that span several hundred years cannot be run at a high-enough spatial resolution to resolve mesoscale ocean dynamics. Recently, several studies have considered Deep Learning to parameterize subgrid forcing within macroscale ocean equations using data from ocean-only simulations with idealized geometry. We present a stochastic Deep Learning parameterization that is trained on data generated by CM2.6, a high-resolution state-of-the-art coupled climate model. We train a Convolutional Neural Network for the subgrid momentum forcing using macroscale surface velocities from a few selected subdomains with different dynamical regimes. At each location of the coarse grid, rather than predicting a single number for the subgrid momentum forcing, we predict both the mean and standard deviation of a Gaussian probability distribution. This approach requires training our neural network to minimize a negative log-likelihood loss function rather than the Mean Square Error, which has been the standard in applications of Deep Learning to the problem of parameterizations. Each estimate of the conditional mean subgrid forcing is thus associated with an uncertainty estimate–the standard deviation—which will form the basis for a stochastic subgrid parameterization. Offline tests show that our parameterization generalizes well to the global oceans and a climate with increased urn:x-wiley:19422466:media:jame21414:jame21414-math-0001 levels without further training. We then implement our learned stochastic parameterization in an eddy-permitting idealized shallow water model. The implementation is stable and improves some statistics of the flow. Our work demonstrates the potential of combining Deep Learning tools with a probabilistic approach in parameterizing unresolved ocean dynamics.