AUTHORS - VANT | VOPROSY ATOMNOY NAUKI I TEKHNIKI

Since 1978
Published in Sarov (Arzamas-16), Nizhegorodskaya oblast

RUSSIAN FEDERAL
NUCLEAR CENTER -

ALL-RUSSIAN RESEARCH INSTITUTE
OF EXPERIMENTAL PHYSICS

Русский |

English

MODEL IMPLEMENTATION FOR THE GLOBAL ATMOSPHERE CIRCULATION ON MASSIVELY PARALLEL DISTRIBUTED MEMORY COMPUTER

V.N. Glukhov
VANT. Ser.: Mat. Mod. Fiz. Proc 1997. Вып.1. С. 50-51.

      The prediction of weather and climate variations requires high computational performance. One of the ways to resolve this situation.is to use multiprocessors.
      This report is devoted to the problems relating to the parallelization of the model for global atmospheric circulation developed by the Institute of Computational Mathematics on the multiprocessor MVS-100 (Keldysh Institute) and to the results obtained. The development of this model took many years and now it is extensively used in various international experiments (AMIP, FANGIO).
      The model rests on the system of complete nonlinear equations for atmosphere hydrothermodynamics in Lamb form on a sphere using the vertical s coordinate. The difference approximation: of the spatial operator over the horizontal is accomplished on shifted Arakawa C grid regular with respect to latitude and longitude. The grid step along the latitude circles is Δα = 5° and Δφ= 4° along the meridians; the uniform splitting into 7 levels is used, vertically. The integration over time uses the semi-implicit scheme with the step of 20”. At each integration step, Helmholtz equations on sphere are solved with the reduction relative to the variable λ.
      Two parallelization techniques were used: in the first case the data were distributed among the processors in terms of latitude, in the second case this was done in terms of latitude and longitude. To solve the Helmholtz equations a pipeline was organized, and the summation of data over the processors used the coupling scheme.
      It was assumed that if one processor needs the time t for execution, p processors can execute in time t/p. However this ideal acceleration is achieved only in very special cases. The experiments showed that the performance increases nonlinearly with the number of processors and there exists a certain number of processors where the maximum performance is obtained. For example, for the first parallelization technique the computational time was reduced only about by a factor of 4 while the minimum was observed on 9 processors.
      By summing the execution time of individual procedures over the processors one can identify those for which the total execution time increases and those for which it remains actually unchanged. It could be noted that those procedures for which the total time increases contain the dependencies on data residing on different processors.
      For the first parallelization method the maximum performance was estimated to be 10.3Mfiops (without optimization or compilation).
      The computer system resources are not completely used because; the interprocessor transfers are needed. The second approach must be more efficient since in this case the amount of data transferred decreases is the number of processors grows while it remains fixed in the case of the first approach.

[ Back ]