複雑衝撃波研究分野孫研究室

2-D and 3-D Vectorized adaptive solver

(VAS2D & VAS3D)

first published on November 1, 1997, last revised on August 2, 2007)

In most problems with various length scales, an adaptive mesh method may reduce the cost of achieving a given resolution by an order of magnitude or more. However, the method makes it much difficult in exploiting the resources of a supercomputer. We developed a locally adaptive code for any structured or unstructured quadrilateral grid for two dimensions (VAS2D) and hexahedral grid for three dimensions (VAS3D). The data structure is designed to vectorize the adaptation procedure as well as the flow solver, without increasing the overhead on a computer with a single CPU, so that it can be run with a high efficiency in various platforms, supercomputer as well as a desktop computer. This page introduces the basic idea behind the VAS2D and the VAS3D.

Overview

A two-dimensional & axisymmetric vectorized adaptive solver (VAS2D) is developed on unstructured quadrilateral grids using the cell-edge data structure. This general-purpose solver is able to simulate unsteady/steady, inviscid/viscid, frozen/equilibrium compressible flows for arbitrary geometries. The VAS2D consists of three parts, pre- and post-processing, flow solver, and adaptation procedure. All parts are equally important, and demand considerable skills and resources.

Pre-processing

A quadrilateral grid is automatically generated by an in-house mesh tool or commercial softwares for any desired computational domain. The grid data is then automatically transformed to the cell-edge data structure that is used in the flow solver and the adaptation procedure.

Flow solver

In principle, any scheme that belongs to the family of the finite volume method can be used to treat complicated control volumes. Two schemes have been constructed in the VAS2D. One is the MUSCL-Hancock scheme, a second-order upwind scheme, another is a second-order central-difference scheme with artificial smoothing.

Adaptation procedure

refines flow regions with large numerical errors, and coarsens regions with small truncation errors. The cell-edge data structure is applied to the flow solver and the adaptation procedure. The source code is written in Fortran, about 1,100 lines for the adaptation procedure (greatly simplified from and 500 lines for the flow solver.

Post-processing

transforms data on the cell-edge data structure to the conventional vertex-centered or cell-centered structure, then flow visualization and data processing can be realized by using commercial softwares. The VAS2D provides a few simple but quick subroutines to investigate results, by outputting iso-contours or fringes in the PostScript format; numerical results shown in this webpage (.GIF pictures) are directly transformed from the output of the VAS2D. Post-processing will not be discussed further (although it is equally important).

The source code was originally programmed in Fortran. The adaptation module contains about 1,100 lines for the adaptation procedure (300 lines shorter than previous versions) . A typical module in the flow solver is a few hundreds lines long. For example, the module for inviscid gas modeling using the MUSCL-Hancock scheme contains 448 lines. Main modules included in the flow solver are data reconstruction module (computing spatial gradients), viscous module (computing viscosity & heat conduction terms), two inviscid modules (computing convection terms using two different schemes).

The three-dimensional adaptive solver using hexahedral cells has been developed in Fortran 90.

Performance on Cray C90

The flow solver used in this test is the classic Lax-Wendroff scheme. The spurious oscillations which numerically appear with discontinuities are removed by an additional conservative smoothing step. The computational efficiency of all algorithms is measured by calculating a shock wave diffracting over a 90 degree corner. The test runs 400 time steps and uses about 10,000 cells (cell number can not be fixed because of adaptation). The memory requirement for the computation is nearly 2mega words. figure shows the CPU time required for main subroutines of the algorithm in one time step. The flow solver uses 70% of the compute time, other time is mainly spent by grid adaptation, geometry update. The geometry update only computes cell volumes and edge lengths. According to hardware statistics, the average vector length is 117.6 (the maximum machine length is 128), therefore the algorithms are well vectorized. The average CPU time requires about 4 microseconds per cell per step. This speed is over one hundred times faster than that conducted in a workstation using upwind schemes with either exact or approximate Riemann solvers. The efficiency is not only due to the vectorizable data structure but also due to the Euler solver, which is a central-difference scheme with the conservative smoothing.

Note: All computations are carried out in Cray C90 in single processor mode.

Cell-edge data structure

A fundamental nature of the data structure is that every cell points to its four edges, and every edge points to its two neighboring cells. The basic information saved for a cell is its location and the indices of four edges. The four edges are uniquely ordered. In Figure, edge BC, CD, DA, AB are neighboring edges NE1, NE2, NE3, NE4 respectively. Every edge is defined as a directional segment as shown in Figure. The direction assures that the left of NE1, NE4 and the right of NE2, NE3 are the cell itself, while the right of NE1, NE4 and the left of NE2, NE3 corresponds to the right, below, above and left neighboring cells, respectively. This strict definition reduces conditional statements in the code especially in adaptation.

The basic information saved for an edge is the locations of its two ends and the indices of its left and right cells. Note that a cell never directly points to another cell but through their common edge. For instance, if cell i need the information of its right cell j, then it should get the right index of its NE1. It is clear that one adjacent connectivity only needs two memory reads during computation, because of the unique definition of geometry.

Adaptation procedure

The refinement and coarsening procedures are handled separately. Both procedures have similar steps for vectorization:

handling the inside of cells which are flagged to be refined or coarsened,
handling the edges of the flagged cells,
arranging memory.

Step 1 is conducted based on cells, and updates all inside information, such as edge deletion or inserting new cells, which is independent of other cells. Step 2, based on edges, renews every edge of refined or coarsened cells and its two neighboring cells, which may be done without influencing other edges.

In step 1, it is required that no two neighboring cells differ by more than one refinement level. This rule prevents pathologically large volume ratios under certain circumstances. An illustration of this rule can be found in some literature. A cell to be refined must be divided into four subcells or sons as shown in the Figure. If an edge is split into two subedges, we call the edge a mother, and the subedges daughters. There are some options in the division. One is that the node inside can be optimized to get good shapes of the sons. In present computation the node is set to the center of the father cell. Another option is the locations of new nodes on the outside edges, usually at the centers. But if an edge is a boundary, AB for instance, we choose the new location E so that edge OE is perpendicular to the boundary if this choice does not generate extremely skewed cells. If all four sons are flagged coarse, then they are deleted and recovered to their father cell, which is coarsening.

Step 1 and step 2 change the status of some cells and edges, for example from sons to fathers. Step 3 just cleans and arranges the memory, and divides all cells and all edges into two groups, father/non-father and mother/non-mother, according to their updated statues. The non-mother edges are further divided into boundary/non-boundary edges. These classifications generate an efficient data structure for a flow solver.

Figure above illustrates memory strides of the edge list and the cell list after the adaptation procedure. All cell indices are saved in the cell list, while edge indices in the edge list. Both of them are constant strides, what vectorization requires. The lists are very efficient especially for a flow solver using finite volume method. Non-fathers are just non-overlapping physical control volumes. These control volumes are numbered consecutively in the cell list. On the other hand, all non-mothers are all interfaces of the control volumes. Flux evaluations may be computed along the list of non-mother edges with vector process.

The adaptation subroutine includes 34 loops and about 1,400 FORTRAN lines. All loops are vectorized. Its practical vectorization efficiency is tested by refining one cell from level 0 to level 6 (4096 cells), then coarsening the refined cells from level 6 to level 0. Left Figure shows the average CPU time normalized by 10,000 cells. No flow solver is computed in this test. The CPU time is decreased with increasing total cell numbers. When the total cells are over 1,000, the speedup is about 30. This indicates that the adaptation procedure is well vectorized.

Finite volume method

An efficient flow solver can be constructed on this cell-edge data structure, if finite volume method is adopted. According to the method, a flow solver computes the change of the conservative values of all cells by integrating their interface fluxes. This procedure is done following a routine which consists of four basic steps:

computing fluxes through non-mother non-boundary edges;
computing fluxes through non-mother boundary edges;
computing fluxes through mother edges by adding their daughters' fluxes;
computing the sum of four fluxes for every cell.

The computation of fluxes through an edge requires only its two neighboring cells whose indices are explicitly saved for every edge, so that the flux evaluation is easily vectorized. Similar efficiency is also achieved for other steps by the cell-edge data structure. Adjacent connectivity information can be directly obtained. However, to get the same information, the well-known octree structure often need climb up to the root of the branch and then climb down to neighboring cells, which is difficult to vectorize if not impossible. The climbing process is avoided using the cell-edge data structure. Note that because of step 3, the following step 4 simply accumulates the fluxes of four edges no mater whether the edge is split or not while preserving conservativity. Four steps above have almost no conditional statements. Actually the only conditional statement is in step 2, where it deals with different types of boundaries. Thus, the data structure is highly efficient for solving the conservation laws. Difference between numerical schemes is the way how they to calculate the numerical fluxes.Two schemes that belongs to the central-difference scheme and the upwind scheme respectively are used to calculate the fluxes.

Data reconstruction

MUSCL-Hancock scheme

For the solver using the upwind scheme, the MUSCL-Hancock scheme is applied to unstructured grids. Numerical data is first reconstructed by the least square method at cells, and other procedures are all conducted at interfaces. In slope limiting procedure which aims to maintain TVD conditions around sharp gradients, only gradients in the direction between two neighboring cells beside an interface are modified by limiters. The minmod limiter is used in most applications. The limited slopes at two neighboring cells are then used to evolve solutions at the two cells respectively by half a time step. A HLL approximate Riemann solver is chosen to determine the flux at interface because of its simplicity and generality.

Smoothing TVD scheme

For the solver using the central-difference scheme, a TVD smoothing scheme is used. A predictor step evolves the solution by a modified half time step, by locally solving the two-dimensional Lax-Friedrichs scheme at the interface. Both the interpolation and the predictor step necessitate estimation of the gradients at every cell which are given by the least-square method during data reconstruction. In solving the compressible Euler equations, the fluxes through cell faces, which consist of inviscid fluxes and smoothing fluxes. The inviscid fluxes are convection terms and pressure surface terms, or those in the Euler equations. The smoothing fluxes which are added to suppress spurious oscillations.

References

Sun M. (1998) Numerical and experimental studies of shock wave interaction with bodies, Ph.D. Thesis, Tohoku University.

Sun M. Takayama K. (1999) Conservative smoothing on an adaptive quadrilateral grid, J. of Computational. Physics , V150 , pp143-180

Sun M. Takayama K. (2001) A solution-adaptive technique using unstructured hexahedral grids, Anaheim Summer Co-located Conferences, Anaheim, CA, AIAA paper 2001-2656

More Examples

Software for shock wave reflections

Starting Process of a shock tunnel with a hemisphere model

Shock wave diffraction over a 90 degree corner

Overview

Pre-processing

Flow solver

Adaptation procedure

Post-processing

Performance on Cray C90

Cell-edge data structure

Adaptation procedure

Finite volume method

Data reconstruction

MUSCL-Hancock scheme

Smoothing TVD scheme

References

研究資料