OPTIMIZATION OF ALGORITHMS IN APPLIED TECHNICAL MD TEST FOR EFFICIENT GPU IMPLEMENTATION
A. M. Erofeev, M. V. Vetchinnikov VANT. Ser.: Mat. Mod. Fiz. Proc 2022. Вып.3. С. 63-72.
Algorithms of molecular dynamics (MD) benchmark are described that allow completing the transfer of all computations on GPU in such a way that we get rid of a constant need of particle information exchange between the devices. As a result, interaction between GPU and CPU is necessary only to send boundary information between separate GPU using MPI to CPU, and this is much less than sending in the initial code; that is proven by the test measurement of the performance. The speedup of the algorithm on one GPU compared to the initial one is from 8,7 to 12,5 times in case of problems different in their size. When two GPUs were involved in this work, the speedup was from 6,6 to 12,5 times. The parallel efficiency on two GPUs V100 was 76,3-79,6 % in case of problems with the number of particles from 4 mln; on two GPUs А100 it was 77,3-81,8 % in case of problems with the number of particles from 13,5 mln. Keywords: molecular dynamics, performance, GPU, CUDA, algorithms.
|