AUTHORS - VANT | VOPROSY ATOMNOY NAUKI I TEKHNIKI

Since 1978
Published in Sarov (Arzamas-16), Nizhegorodskaya oblast

RUSSIAN FEDERAL
NUCLEAR CENTER -

ALL-RUSSIAN RESEARCH INSTITUTE
OF EXPERIMENTAL PHYSICS

Русский |

English

PROBLEMS AND DEVELOPMENT FEASIBILITY FOR THE COMPUTATIONAL LOAD DYNAMIC BALANCE CODES FOR CONTINUUM MECHANICS CALCULATIONS

S.P. Belyaev, L.I. Degtyarenko, I.Yu. Turutina
VANT. Ser.: Mat. Mod. Fiz. Proc 1997. Вып.1. С. 43-44.

      The codes for distributed-memory parallel systems must:
      - ensure the interprocessor communications via message passing;
      - provide the uniform loading of processors;
      - use the hardware-based compatibility of the menage passing and computations on the processor.
      For continuum mechanics problems with time-dependent loading, it is possible to implement the dynamic balance of the loading by transmitting the computational points from the most loaded to less loaded processors.
      The possible re-allocation of an arbitrary point from one processor to another necessitates the pointwise parallelization.
      The parallel gas-dynamic code can be written as a sequence of steps each of them first calculating the boundary- processor points which require interprocessor transfers and the waiting time is used for the calculation of internal processor points. The message passing is accomplished on the initiative of the transmitter: as soon as the point is calculated necessary transfers to neighbors are initiated and the calculations for the next point start in parallel with transfers. This organization is simpler and allows to reduce the global waits as compared to other approaches.
      The parallel heat-conduction code can be written based on parallel-pipeline approach where each processor sequence executes a group of 1-D runs in the pipeline mode and all processor sequences execute in parallel. The parallel-pipeline algorithm is also implemented for an arbitrary set of processor points.
      The computational codes are written as event processing routines. A special order of event processing allows the maximum balance between the computations and communications.
      For the maximum computations-communications balance, nonblocking transfers are used and the prohibition is introduced to use global communications.
      The communications buffering is provided to increase the efficiency.
      The dynamic loading balance does not depend on the computational method, can be accomplished after the given number of steps and is implemented based on the following:
      - minimization of distant transfers;
      - local decisions about the load redistribution.

ALGORITHMS FOR AUTOMATIC LOAD REDISTRIBUTION

S.P. Belyaev, I.Yu. Turutina
VANT. Ser.: Mat. Mod. Fiz. Proc 1997. Вып.1. С. 44.

      Parallel codes intended for the calculations on distributed memory systems must take into account the specific of these systems. First, it is necessary:
      - to provide the communications between the processes running on different processors through the message passing;
      - to achieve the uniform loading of processors;
      - to use the hardware-based balance between the message passing and computations.
      The multidimensional continuum mechanics problems with variable loading allow to implement the dynamic load balance by transmitting the computational points from the most to less loaded processors.
      The algorithms for automatic redistribution of computational points among the processors must not necessarily have high accuracy. If the automatic redistribution algorithms provide parallel computations efficiency of 70-80% on a ten-processor system and about 50% on 100 processors for a considerable part of production problems they can be said to do well. The preliminary estimates indicate that this needs only the load unbalance of 10-20% as the computations run.
      However the redistribution algorithms should necessarily have the following properties: simplicity of the computational procedure to determine the number of points with minimum information and local decisions about the redistribution.
      It is assumed that prior to the algorithm execution each processor has received from the neighbor the information about the step termination including the computational time and number of points that can be received by each neighbor/rom the given processor.
      The general algorithm consists of the following sequential steps:
      1. Evaluation of the number of points to be transinitted to each neighbor.
      2. Generation of transmission lists.
      3. Transmission of points.
      4. Generation of new control lists.
      5. Evaluation of the number of points received for each subsequent balance step.
      The processors communicate only at step 3. All remaining steps are executed by each processor independently.

[ Back ]