Application of CUDA technology for calculation of ground states of few-body nuclei by Feynman ’ s continual integrals method *

The possibility of application of modern parallel computing solutions to speed up the calculations of ground states of few-body nuclei by Feynman’s continual integrals method has been investigated. These calculations may sometimes require large computational time, particularly in the case of systems with many degrees of freedom. This paper presents the results of application of general-purpose computing on graphics processing units (GPGPU). The energy and the square modulus of the wave function of the ground states of several few-body nuclei have been calculated using NVIDIA CUDA technology. The results show that the use of GPGPU significantly increases the speed of calculations.


Introduction
Low-energy reactions involving few-body nuclei [1] constitute a significant part of the studied nuclear reactions.Investigation of their collisions with other nuclei provides valuable information on the mechanisms of fusion and nucleon transfer reactions (e.g., [2]).Knowledge of the properties and the ground state wave functions of these nuclei is necessary for the theoretical description of reactions with their participation.
The few-body problem in nuclear physics has been studied for a long time.For instance, calculations of 3 H and 3 He nuclei were performed in [3] based on the Faddeev equations.The expansion in hyperspherical functions (K-harmonics) [4] was used for calculations of 3 H nucleus in [5] and 4 He nucleus in [6].In [7] the wave function of the three-body system was obtained using Gaussian basis and the numerical solution of the Hill-Wheeler integral equations.
Feynman's continual integrals method [8,9] provides a more simple possibility for calculating the energy and the probability density for the ground state of the few-body system, because it does not require expansion of the wave function in a system of functions.The possibility of application of this method for calculation of energies of ground states of light nuclei up to 4 He was declared in [10], but the power of computers available at that time did not allow to obtain reliable results since the statistics was very low.In [11] calculations were performed on the CPU with the statistics 10 5 .
In this work an attempt is made to use modern parallel computing solutions to speed up the calculations of ground states of few-body nuclei by Feynman's continual integrals method.The algorithm allowing to perform calculations directly on GPU was developed and implemented in C++ programming language.The energy and the square modulus of the wave function of the ground states of several few-body nuclei have been calculated using NVIDIA CUDA technology [12−14] the results show that the use of GPU is very effective for these calculations.

Theory
The energy 0 E and the square modulus of the wave function 2 0  of the ground state of a system of few particles may be calculated using continual (path) integrals introduced by Feynman [8,9] * The work was supported by grant 15-07-07673-a of the Russian Foundation for Basic Research (RFBR).
is a propagator − the probability amplitude for the particle of mass m to travel from the point 0 q to the point q in time t .Here [ ( )] S q t and Ĥ are the action and the Hamiltonian of the system, respectively, () Dq t is the integration measure [8,9].For the time-independent potential energy the transition to the imaginary (Euclidean) time ti    gives the propagator   0 , ; ,0 with the Euclidean action Integration over q with the periodic boundary condition 0 qq  allows to find the energy 0 E of the ground state in the limit    [10]     0 , ; ,0 Sp exp exp exp ( ) ) Here () gE is the density of states with the continuous spectrum cont EE  .For the system with a discrete spectrum and finite motion of particles the square modulus of the wave function of the ground state may also be found in the limit    [10] together with the energy Outside of the classically allowed region the square modulus of the wave function .Therefore, in this case the formula (7) is in general applicable only for the region not far beyond the classically allowed ground state region.Such situation may occur in the description of bound states of few-particle systems (for example, two protons and a neutron) when the existence of bound states of some of them (e.g., proton plus neutron) is possible.
The contribution of states with the continuum spectrum may be eliminated by introducing infinitely high walls in the potential energy located about the range of the nuclear forces beyond the classically allowed region.Introduction of the boundary condition 0 ( ) 0 q  at these walls will not have a significant effect on the energy 0 E and 2 0 () q  far away from the walls.Feynman's continual integral (2) may be represented as the limit of the multiple integral Here   1 N  -fold integral corresponds to averaging over the "path" of the particle as a broken line in the plane   , q  with the vertices   , , 1, For the approximate calculation of the continual integral (8) the continuous axis  is replaced by the grid , 0, with the step a and the Euclidean propagator of a free particle   (0) 0 , ; ,0 , ; ,0 , ; ,0 exp ( ) The denoted by angle brackets averaging over   may be calculated using the Monte Carlo method [15].The standard algorithm for simulation of the random vector consists in a sequential choice of the values of its components from the conditional distributions W q q q q  is the probability density for the values of the quantity k q given the values of , , , k q q q  .In this case the quantity k q is normally distributed with the mean value Mq , variance k D and standard deviation kk D  where The next trajectory in the simulation is calculated by the formula where k  is a normally distributed random variable with zero mean and unity variance.Sample onedimensional random trajectories for low and large numbers of time steps are shown in Figs.1a and 1b, respectively.
For large values of  random trajectories may reach the region where the probability density for the states with continuum spectrum is substantially larger than the probability density for the ground state, which may lead to a deviation from the asymptotic behavior (7) and the growth of the error.Therefore, the formula ( 7) is only applicable for the not very large values of  .For convenience of calculations in the scale of nuclear forces the expressions ( 6), ( 10) − ( 15) are represented using dimensionless variables is the nucleon mass, , =M ,   2 00 00 11 ln , ; ,0 ln ( ) , Formulas (2)−( 16) are naturally generalized to a larger number of degrees of freedom and few particles including identical ones.The nucleon identity requires symmetrization of trajectories [9], therefore the configurations symmetric with respect to the positions of two neutrons and/or protons will be considered below.The nuclei 3 H, 3 He and 4 He contain only two identical fermions (protons and/or neutrons with opposite spins) and the calculation of their ground states may be carried out without taking into account the Pauli principle by selecting pairs of identical fermions.
It should be noted that the calculation of multiple integrals required to find the multidimensional probability density   and the application of the formula (7) in a single point in the multidimensional space allows to find the approximate value of the energy of the ground state.
To reduce the multiplicity of integrals in the formula (10) the calculation should be performed in the center of mass system using the Jacobi coordinates [4,9].
For a system of two particles ( 2 H nucleus where 1 r and 2 r are the radius vectors of a proton and a neutron, respectively.For a system of three particles, two of which are identical (2 neutrons or 2 protons in 3 H and 3 He nuclei, respectively) In the case of 3 H nucleus 3 r is the radius vector of a proton, 1 r and 2 r are the radius vectors of neutrons.In the case of 3 He nucleus 3 r is the radius vector of a neutron, 1 r and 2 r are the radius vectors of protons.
For a system of four particles consisting of two pairs of identical particles (2 protons and 2 neutrons in 4 He nucleus) In the calculation of the propagator   0 , ; ,0 K q q  for the nuclei 2 H, 3 H, 3 He, 4 He neutron-proton have been used.The dependence of the nucleon-nucleon interaction with a hard core on the distance r was approximated by a combination of Gaussian type exponentials similar to the M3Y potential [16,17]   The values of the parameters MeV were determined from the condition of the absence of bound states of two identical nucleons as well as the approximate equality of the energy 0 E found from ( 5) and (7) to the experimental values of the binding energies for the nuclei 2 H, 3 H, 3 He, 4 He (e.g., [18]).The plots of the potentials (25) and (26) are shown in Fig. 2.

Implementation
For numerical calculations the Monte Carlo method was used.The algorithm was developed and implemented in C++ programming language using NVIDIA CUDA technology.
The principal scheme of the calculation of the ground state energy for the one-dimensional case is shown in Fig. 3.The calculation of the propagator ( 18) is performed using L sequential launches of the kernel.Each kernel launch simulates n random trajectories in the space evolving from the Euclidean time 0  to 0 / j jt a  , where 1, jL  (see Fig. 1).All trajectories start at the same point 0 q in the space and in the moment j  return back to the same point 0 q according to the probability distribution described above.The choice of the initial point 0 q is arbitrary.In the case of the multidimensional space 0 q must be replaced with the set of coordinates in the multidimensional space.All threads in a given kernel launch finish at approximately the same time, which makes the scheme quite effective in spite of the possible delays associated with the kernel launch overhead.Besides, the typical number of kernel launches L required for the calculation of the ground state energy usually does not exceed 100.The principal scheme of the calculation of the square modulus of the wave function for the onedimensional case is shown in Fig. 4. Similarly, the calculation is performed using M sequential launches of the kernel.Each kernel launch simulates n random trajectories in the space from the Euclidean time 0  to the time lin  determined in the calculation of the ground state energy.All trajectories start at the same point i q in the space and in the moment lin  return back to the same point i q according to the probability distribution described above.Here 1, iM  , where M is the total number of points in the space in which the square modulus of the wave function must be calculated.In the case of the multidimensional space i q must be replaced with the set of coordinates in the multidimensional space.One of the benefits of the approach is that the calculation may be easily resumed at a later time.
For example, initially the square modulus of the wave function may be calculated with a large space step to obtain the general features of the probability distribution, and later new intermediate points are calculated and combined with those calculated previously.This may be very useful because the calculation of the square modulus of the wave function is generally much more time-consuming since it requires calculation in many points in the multidimensional space.
An important feature of the algorithm allowing to effectively use graphic processors is low consumption of memory during the calculation because it is not necessary to prepare a grid of values and store it in the memory.
To obtain normally distributed random numbers the cuRAND random number generator was used.According to the recommendations of the cuRAND developers each experiment was assigned a unique seed.Within the experiment, each thread of computation was assigned a unique sequence number.All threads between kernel launches were given the same seed, and the sequence numbers were assigned in a monotonically increasing way.

Results and discussion
Calculations were performed on the NVIDIA Tesla K40s accelerator installed within the heterogeneous cluster [19] of the Laboratory of Information Technologies, Joint Institute for Nuclear Research, Dubna.The code was compiled with CUDA version 7.5 for architecture version 3.5.Calculations were performed with single precision.The Euclidean time step 0.01 a  was used.Additionally, NVIDIA GeForce 9800 GT accelerator was used for debugging and testing purposes.
The behavior of the curves may be easily explained if we note that in all these cases only the energy of the ground state is negative and therefore only the first term in (4) increases with the increase of  , whereas the energies of the excited states are positive and hence the other terms in (4) decrease with the increase of  .The results of linear fitting of the straight parts of the curves are shown in Fig. 5eh.According to the formula (21) the angular coefficient of the linear regression equals the binding energy.The obtained theoretical binding energies are listed in Tab. 1 together with the experimental values taken from [18].It is clear that the theoretical values are close enough to the experimental ones, though obtaining good agreement was not the goal.As can be seen from Fig. 2, the difference between neutronneutron () nn Vr  and proton-proton () pp Vr  potentials is very small.Nevertheless, the difference between the calculated binding energies of 3 H and 3 He is observed in agreement with the experimental values.
The comparison of the square modulus of the wave function for 2 H calculated on GPU using NVIDIA CUDA technology within Feynman's continual integrals method and the square modulus of the wave function calculated on CPU within the shell model is shown in Fig. 6a.The same potentials (25), (26) were used.Good agreement between the curves confirms that the code based on Feynman's continual integrals method using CUDA technology provides correct results.The theoretical charge distribution for 3 He (circles) compared with experimental data taken from [18] (lines).
It should be mentioned that the wave function cannot be measured directly, though the charge radii and charge distributions obtained from experiments may provide some information on its behavior.To compare the results of calculations with the experimental charge radii and charge distributions the wave function must be integrated.
The probability density distribution   2 0 ; Rr  for the three-body configurations of 3 He (p + p + n) with 0    , 45 , 90 is shown in logarithmic scale in Fig. 7a,b,c, respectively, together with the potential energy surface (linear scale, lines).The vectors in Jacobi coordinates are shown in Fig. 7d.
The theoretical charge distribution for 3 He obtained by integration of the wave function is compared with experimental data taken from [18] in Fig 6b .As can be seen, the agreement is very good.
The obtained theoretical charge radius ; ; ,0,0;0,0, ;0, ,0 for the symmetric tetrahedral configuration of four nucleons in the nucleus 4 He , , ,0,0 , 0,0, , 0, ,0 is shown in logarithmic scale in Fig. 7e together with the potential energy surface (linear scale, lines).The vectors in Jacobi coordinates are shown in Fig. 7f.Note also that the presence of the repulsive core in the nucleon-nucleon interaction reduces the probability of finding nucleons in the center of mass of the system for the considered symmetric configurations.This should lead to a smoother increase in the concentration of nucleons and the density of electric charge when approaching the center of the nucleus.The analysis of the properties of   2 01 ,, n rr  allows to choose analytical approximations for it, e.g., as the product of the Gaussian type exponentials.The obtained approximations may be used in dynamic calculations.The code implementing Feynman's continual integrals method was initially written for CPU.The comparison of the calculation time of the ground state energy for 3 He using Intel Core i5 3470 and NVIDIA Tesla K40s with different statistics is shown in Tab. 2.Even taking into account that the code for CPU used only 1 thread and a different random number generator, the time difference is impressive.This fact allows to increase the statistics and the accuracy of calculations in the case of using CUDA technology.The comparison of the calculation time of the square modulus of the wave function for the ground state of 3 He using Intel Core i5 3470 and NVIDIA Tesla K40s with the statistics 10 6 and the number of points in the space 60•60•12 is shown in Tab. 3. The value ~ 177 days for CPU is an estimation based on the performance gain in the calculation of the ground state energy.It is evident that beside the performance gain the use of CUDA technology may allow to reduce the space step in the calculation of the wave functions, as well as greatly simplify the process of debugging and testing, and in certain cases it may even enable calculations impossible before.

Conclusion
In this work an attempt is made to use modern parallel computing solutions to speed up the calculations of ground states of few-body nuclei by Feynman's continual integrals method.The algorithm allowing to perform calculations directly on GPU was developed and implemented in C++ programming language.The method was applied to the nuclei consisting of nucleons, but it may also be applied to the calculation of cluster nuclei.The energy and the square modulus of the wave function of the ground states of several few-body nuclei have been calculated by Feynman's continual integrals method using NVIDIA CUDA technology.The comparison with the square modulus of the wave function for 2 H calculated on CPU within the shell model was performed to confirm the correctness of the calculations.The obtained values of the theoretical binding energies are close enough to the experimental values.The theoretical charge radius and charge distribution for 3 He nucleus are also in good agreement with the experimental data.The results show that the use of GPGPU significantly increases the speed of calculations.This allows to increase the statistics and the accuracy of calculations as well as reduce the space step in calculations of wave functions.It also greatly simplifies the process of debugging and testing.In certain cases the use of CUDA enables calculations impossible before.

Fig. 1 .
Fig. 1.Sample one-dimensional random trajectories for low (a) and large (b) numbers of time steps.

where 1 r and 2 r 3 r and 4 r
are the radius vectors of protons, are the radius vectors of neutrons.

Fig. 3 .
Fig. 3.The scheme of calculation of the ground state energy for the one-dimensional case.Starting from the certain time 0 / lin lin L t a , where1

Fig. 4 .
Fig. 4. The scheme of calculation of the square modulus of the wave function for the one-dimensional case.

Fig. 5 .
Fig. 5.The dependence of the logarithm of the propagator 1 0 ln E bK  on the Euclidean time  for 2 H (a), 3 H (b), 3 He (c) and 4 He (d).Lines are the results of linear fitting of the data lying on the straight parts of the curves for 2 H (e), 3 H (f), 3 He (g) and 4 He (h).Different symbols correspond to different statistics n: empty circles (10 5 ), filled circles (10 6 , 5•10 6 , 10 7 ).

Fig. 6 .
Fig. 6.(a) The square modulus of the wave function for 2 H calculated on GPU using NVIDIA CUDA technology within Feynman's continual integrals method (circles) compared with the square modulus of the wave function calculated on CPU within the shell model (line); r is the distance between the proton and the neutron.(b)The theoretical charge distribution for 3 He (circles) compared with experimental data taken from[18] (lines).

Fig. 7 .
Fig. 7.The probability density for the configurations of 3 He with 0    (a), 45 (b), 90 (c) and the vectors in Jacobi coordinates (d).The probability density for the configuration of 4 He symmetric with respect to the positions of protons and neutrons (e) and the vectors in Jacobi coordinates (f).

Table 1 .
Comparison of theoretical and experimental energies of ground states.

Table 2 .
Comparison of the calculation time of the ground state energy for3He nucleus.

Table 3 .
Comparison of the calculation time of the square modulus of the wave function for the ground state of3He nucleus.