NUMA | mysite

The Nonhydrostatic Unified Model of the Atmosphere (NUMA)

Welcome to the NUMA page. Here, you will find information describing the NUMA model.

What's new with NUMA

A new version of NUMA has been developed. This new model (xNUMA) is similar to NUMA except that it can generate an arbitrary number of multi-domains to support super-parameterization regions and scientific machine learning strategies. Since January 2024, explicit NUMA with OpenACC is officially part of the Nvidia SDK Testing Analysis suite that is run nightly using the latest dev compilers. A 3D IMplicit-EXplicit (IMEX) time-integration version of NUMA has been ported to the GPU but still need to port the 1D-IMEX solvers.

What is NUMA

NUMA is a compressible Navier-Stokes solver specifically designed to solve the equations for applications in nonhydrostatic atmospheric modeling. Because NUMA uses tensor-product basis functions, it can easily handle 1D, 2D, and 3D domains. On 3D domains, NUMA solves the equations in a cube as well as on a sphere. NUMA is the dynamical core of the U.S. Navy's NEPTUNE system. NEPTUNE stands for the Navy's Environmental Prediction sysTem Using the NUMA corE. In 2015, NUMA/NEPTUNE participated in a NOAA (National Oceanographic and Atmospheric Administration) study conducted to determine which model will be used operationally by the U.S. National Weather Service (for further information click here).

What can NUMA be used for

NUMA can be used for simulating nonhydrostatic atmospheric processes; the plot on the right shows the surface pressure after 10 days for the baroclinic instability on the sphere using the deep planet equations. NUMA can also be used for studying various impacts of hurricanes (for more on this study click [PDF]) or thunderstorm simulations (click [PDF]); the second plot on the right shows the horizontal winds at 0.19km height after 6 hours of a hurricane simulation with real observational heating data (for further details see [PDF}). One of the unique features of NUMA is that it can use adaptive mesh refinement (for more information click here: [PDF], [PDF], [PDF], [PDF] ). Last but not least, NUMA is entirely scalable on the largest supercomputers in the world. Results for NUMA on the IBM BG/Q Mira (formerly at Argonne) can be found here [PDF] whereas results on the Cray XK7 Titan (formerly at Oakridge) can be found here [PDF].

Who built NUMA

NUMA was conceived in the Scientific Computing group in the Department of Applied Mathematics at the Naval Postgraduate School. Many of my National Research Council postdocs have contributed to the construction of NUMA. They include: Jim Kelly (now at the Naval Research Lab in Washington DC), Michal Kopera (now at Boise State University), Simone Marras (now at NJIT), Andreas Mueller (now at ECMWF), Daniel Abdi (now at the NOAA ESRL), Sohail Reddy (now at Lawrence Livermore NL), Felipe Alves (now at the University of São Paulo), Soonpil Kang (now at Lawrence Livermore NL) and Kiran Jadhav (NPS). Jim Kelly continues to be a major contributor to the model. Jim Kelly and the NPS group were co-PIs on an ONR-funded project to increase the fidelity of hurricane simulations. Simone Marras (NJIT) is a co-PI on an NSF-funded project to improve the understanding of hurricane dynamics. He and his former PhD student, Yassine Tissaoui (now at Wisconsin), (Marras group site) work jointly with Steve Guimond at Hampton University (Guimond group site) on this project. Finally, the NPS, NJIT, and Hampton University groups work with Sam Stechmann and his group (Stechmann group site) on the implementation of ice microphysics for NUMA.

Who funded NUMA

NUMA has been constructed primarily using funds from the Office of Naval Research (ONR), Marine Meteorology Directorate and DARPA. The goal of the Defense Advanced Research Projects Agency (DARPA) project is to extend NUMA (and NEPTUNE) to the thermosphere. This work is now also funded ONR.

NUMA Governing Equations

NUMA uses the so-called deep planet equations which means that NUMA can be used for high-altitude physics simulations. NUMA uses five forms of the compressible Navier-Stokes equations. It uses 1) the density, velocity, and potential temperature equations (non-conservative form, Set 2NC), 2) density, momentum, density-potential temperature equations (conservative form, Set 2C), 3) density, momentum, density-energy form (conservative form, Set 3C), 4) density, momentum, density-internal energy form (conservative balance law form, Set4C), and 5) density, velocity, and internal energy form (non-conservative form, Set 4NC). The first three equation sets are described in the following two papers: [PDF] and [PDF]. The fourth and fifth sets are used for thermospheric simulations; results using the fifth equation set can be found here [PDF].

Numerical Methods in NUMA

NUMA uses continuous (CG) and discontinuous Galerkin (DG) methods for approximating the spatial derivatives in the compressible Navier-Stokes equations. Papers describing the spatial discretization methods can be found here: CG and DG for Navier-Stokes [PDF], CG and DG unified in the same code-base [PDF], Kinetic-Energy-Preserving [PDF], and Entropy-Stable [PDF] methods. The time-derivatives are solved using either explicit, fully-implicit (FI), implicit-explicit (IMEX), and exponential methods; we have also added the capability to use multirate time-integrators so that each process may have its own time-step. The IMEX methods require the use of either direct solvers (e.g., LU decomposition) or iterative solvers (GMRES with preconditioners). Papers describing the IMEX methods in NUMA can be found here: [PDF], while papers describing the preconditioners can be found here: [PDF] and [PDF]. The FI and HEVI (horizontally explicit vertically implicit) also require iterative solvers and preconditioners which is the subject of on-going work; the paper on HEVI can be found here: [PDF] We have also derived Schur complements for the IMEX methods; a paper describing the Schur complement for DG for the Euler equations can be found here [PDF]. NUMA is also equipped with positivity-preserving schemes for tracer transport and stabilization methods based on subgrid scale turbulence closures and tensor-based hyper-viscosity.

NUMA Grids

Currently, the only restriction on the grids that NUMA can use is that they be comprised of quadrilaterals (in 2D) and cubes (in 3D); however, the grids can be entirely unstructured as shown in the figure on the right. NUMA is able to use different polynomial orders in each direction of the cube. However, when used in global mode (flow on the sphere) NUMA uses the same polynomial order in both horizontal directions (on the spherical manifold). Although NUMA has its own internal grid generator, NUMA also includes the capability to use the P4est parallel grid generator. NUMA also has the capability to use adaptive mesh refinement (AMR) via P4est. NUMA uses P4est to handle all the mesh information including the graph partitioning but can also use METIS. The advantage of P4est is that it has been shown to be scalable up to hundreds of thousands of CPU-only cores. NUMA currently is able to construct static and dynamic AMR with both CG and DG. A paper describing the use of NUMA with P4est can be found here [PDF].

NUMA HPC Performance

One of the main features of NUMA is that it has been designed to fully exploit current massively parallel architectures including CPU-only (figure on the right) and CPU-GPU hybrid. On the Mira IBM Blue Gene Q of the Argonne National Laboratory, NUMA was shown to scale almost perfectly (99% strong scaling) up to the entire machine (786432 cores each running 4 threads per core for a total of 3.1 Million MPI ranks) for a simulation with 10 billion grid points using a hybrid MPI-OpenMP approach. The publication describing this work can be found here [PDF]. On a similar sized problem, NUMA was shown to achieve 90% weak scaling on the Titan Cray XK7 Supercomputer at Oak Ridge National Laboratory, using up to 90% of the entire machine. This was achieved using 16,000 out of the entire 18,688 Nvidia K20X GPUs using the OCCA library. Titan's peak performance is 27 petaflops while Mira's is 10 petaflops. The paper describing this work can be found here [PDF].

NUMA with Implicit Time-Integration on Many-core Computing

Unlike many other on-going efforts regarding exploiting co-processor (accelerator) technology whereby only portions of the code are moved onto the accelerators (i.e., GPU, Intel MIC) we have ported the entire NUMA code base. This includes not only the explicit dynamics for both CG and DG methods but also the entire implicit time-integration infrastructure which includes all of the IMEX time-integrators (1D-IMEX and 3D-IMEX, both single-step multi-stage and single-stage multi-step methods), iterative solvers (e.g., GMRES, BiCGStab, Richardson, etc.), our banded direct solvers (i.e., LU decomposition), and the Polynomial-Based Nonlinear Least-Squares Optimized (PBNO) preconditioner. The paper describing this work can be found here [PDF]. This work was carried out using the OCCA API although work on using OpenACC for this task is under way.

To show how well the Implicit-Explicit (IMEX) NUMA performs on accelerators, we compare the speedup relative to the explicit time-integrators, which we have already shown achieve well over 90% weak scaling efficiency.

The figure on the right compares the wallclock time in minutes for a one-day forecast for NUMA at different grid resolutions including: 13 km (blue), 10 km (green), and 3 km (red) on the Titan supercomputer while only using the GPUs on each node for an acoustic wave on the sphere with dry dynamics. For these simulations we use OCCA with a CUDA backend. The results show that 3D-IMEX (circles) yields a speedup of 5x compared to the explicit results (squares). The blue and green lines show that we can achieve a one-day forecast within 10 wallclock minutes with no more than 2048 GPUs when 3D-IMEX solvers are used - with explicit time-integration, we require more hardware. At 3 km resolution (red line) for 3D-IMEX (circles) we require 35 wallclock minutes to deliver a one-day forecast with 4096 GPUs. The 1D-IMEX solvers (diamonds) yield well over 5x speedup over the 3D-IMEX solvers for this test case. The 1D-IMEX solvers allow us to run a one-day forecast at 3 km resolution (red line with diamonds) within 5 wallclock minutes while only using 4096 GPUs.

NUMA Cloud Simulation

The simulation below shows a 3D NUMA cloud simulation using the Maya software for visualization. The clouds and rain are all simulated with warm rain (Kessler physics). The sun and sound are added for special effect.