MPI+OpenMP Implementation of Resolution-of-the-Identity Hartree–Fock Method Exploiting Permutational Symmetry of Three-Center Electron Repulsion Integrals

Authors

  • Iurii V. Kashpurovich Joint Institute for High Temperatures of Russian Academy of Sciences, Moscow, Russian Federation; Moscow Institute of Physics and Technology, Moscow, Russian Federation https://orcid.org/0009-0002-7385-3568
  • Alexander V. Oleynichenko Petersburg Nuclear Physics Institute named by B.P. Konstantinov of NRC “Kurchatov Institute”, Gatchina, Russian Federation; Moscow Institute of Physics and Technology, Moscow, Russian Federation https://orcid.org/0000-0002-8722-0705
  • Vladimir V. Stegailov Joint Institute for High Temperatures of Russian Academy of Sciences, Moscow, Russian Federation; Moscow Institute of Physics and Technology, Moscow, Russian Federation; HSE University, Moscow, Russian Federation https://orcid.org/0000-0002-5349-3991

DOI:

https://doi.org/10.14529/jsfi260105

Keywords:

restricted Hartree–Fock method, resolution-of-the-identity, density fitting, three-center electron repulsion integrals, MPI, OpenMP

Abstract

We report a high-performance implementation of the resolution-of-the-identity Hartree–Fock method that fully exploits the permutational symmetry of three-center electron repulsion integrals (ERIs). The present implementation adopts a hybrid MPI+OpenMP parallelization strategy. Two different algorithmic approaches (with and without the preliminary transformation of ERIs) are analyzed and compared. A custom data layout introduced previously is employed. Designed to efficiently leverage the permutational symmetry of ERIs, it minimizes not only inter-node communication but also local memory traffic. Other extensive low-level and algorithmic optimizations are proposed and discussed. Reasonable parallel scaling is demonstrated by performance benchmarks on a chlorophyll dimer (C55H72O5N4Mg)2 in an aqueous environment of 48 molecules (322 atoms overall, 3700 and 11896 functions in main and auxiliary basis sets, respectively). Peak speedups of 84× and 71× on 128 threads are achieved for the ERI calculation and the exchange matrix construction, respectively, within the algorithm involving the preliminary transformation.

References

Alexeev, Y., Kendall, R.A., Gordon, M.: The distributed data SCF. Computer Physics Communications 143(1), 69–82 (2002). https://doi.org/10.1016/S0010-4655(01)00439-8

AMD: AMD EPYCTM 9005 Proceessor Architecture Overview. https://docs.amd.com/v/u/en-US/58462_amd-epyc-9005-tg-architecture-overview (2025), accessed: 2026-02-15

Blackford, L.S., Demmel, J., Dongarra, J., et al.: An updated set of basic linear algebra subprograms (BLAS). ACM Transactions on Mathematical Software 28(2), 135–151 (2002). https://doi.org/10.1145/567806.567807

Bussy, A., Schütt, O., Hutter, J.: Sparse tensor based nuclear gradients for periodic Hartree-Fock and low-scaling correlated wave function methods in the CP2K software package: A massively parallel and GPU accelerated implementation. Journal of Chemical Physics 158(16), 164109 (2023). https://doi.org/10.1063/5.0144493

Bussy, A., Hutter, J.: Efficient periodic resolution-of-the-identity Hartree-Fock exchange method with k-point sampling and Gaussian basis sets. Journal of Chemical Physics 160(6), 064116 (2024). https://doi.org/10.1063/5.0189659

Calaminici, P., Domínguez-Soria, V.D., Geudtner, G., et al.: Parallelization of three-center electron repulsion integrals. Theoretical Chemistry Accounts 115(4), 221–226 (2005). https://doi.org/10.1007/s00214-005-0005-0

Dyczmons, V.: No N4-dependence in the calculation of large molecules. Theoretical Chemistry Accounts 28(3), 307–310 (1973). https://doi.org/10.1007/BF00533492

Echenique, P., Alonso, J.L.: A mathematical and computational review of Hartree–Fock SCF methods in quantum chemistry. Molecular Physics 105(23–24), 3057–3098 (2007). https://doi.org/10.1080/00268970701757875

Eichkorn, K., Treutler, O., Öhm, H., et al.: Auxiliary basis sets to approximate Coulomb potentials. Chemical Physics Letters 242(4), 652–660 (1995). https://doi.org/10.1016/0009-2614(95)00838-u

Eichkorn, K., Weigend, F., Treutler, O., Ahlrichs, R.: Auxiliary basis sets for main row atoms and transition metals and their use to approximate Coulomb potentials. Theoretical Chemistry Accounts 97(1–4), 119–124 (1997). https://doi.org/10.1007/s002140050244

Foster, I., Tilson, J.L., Wagner, A., et al.: Toward high-performance computational chemistry: I. Scalable Fock matrix construction algorithms. Journal of Computational Chemistry 17(1), 109–123 (1996). https://doi.org/10.1002/(SICI)1096-987X(19960115)17:1<109::AID-JCC9>3.0.CO;2-V

Früchtl, H.A., Kendall, R.A., Harrison, R.J., Dyall, K.G.: An implementation of RI-SCF on parallel computers. International Journal of Quantum Chemistry 64(1), 63–69 (1997). https://doi.org/10.1002/(SICI)1097-461X(1997)64:1<63::AID-QUA7>3.0.CO;2-%23

Gabriel, E., Fagg, G.E., Bosilca, G., et al.: Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Lecture Notes in Computer Science, vol. 3241, pp. 97–104. Springer Berlin Heidelberg (2004). https://doi.org/10.1007/978-3-540-30218-6_19

Gill, P.M., Johnson, B.G., Pople, J.A.: A simple yet powerful upper bound for Coulomb integrals. Chemical Physics Letters 217(1), 65–68 (1994). https://doi.org/10.1016/0009-2614(93)E1340-M

Glebov, I.O., Poddubnyi, V.V.: An effective algorithm of the Hartree–Fock approach with the storing of two-electron integrals in the resolution of identity approximation. Russian Journal of Physical Chemistry A 98(4), 617–625 (2024). https://doi.org/10.1134/S0036024424040101

Guidon, M., Hutter, J., VandeVondele, J.: Auxiliary density matrix methods for Hartree-Fock exchange calculations. Journal of Chemical Theory and Computation 6(8), 2348–2364 (2010). https://doi.org/10.1021/ct1002225

Harrison, R.J., Guest, M.F., Kendall, R.A., et al.: Toward high-performance computational chemistry: II. A scalable self-consistent field program. Journal of Computational Chemistry 17(1), 124–132 (1996). https://doi.org/10.1002/(SICI)1096-987X(19960115)17:1<124::AID-JCC10>3.0.CO;2-N

Häser, M., Ahlrichs, R.: Improvements on the direct SCF method. Journal of Computational Chemistry 10(1), 104–111 (1989). https://doi.org/10.1002/jcc.540100111

Hellweg, A., Hättig, C., Höfener, S., Klopper, W.: Optimized accurate auxiliary basis sets for RI-MP2 and RI-CC2 calculations for the atoms Rb to Rn. Theoretical Chemistry Accounts 117, 587–597 (2007). https://doi.org/10.1007/s00214-007-0250-5

Hollman, D.S., Schaefer, H.F., Valeev, E.F.: Fast construction of the exchange operator in an atom-centred basis with concentric atomic density fitting. Molecular Physics 115(17–18), 2065–2076 (2017). https://doi.org/10.1080/00268976.2017.1346312

Hollman, D.S., Schaefer, H.F., Valeev, E.F.: A tight distance-dependent estimator for screening three-center Coulomb integrals over Gaussian basis functions. Journal of Chemical Physics 142(15), 154106 (2015). https://doi.org/10.1063/1.4917519

Huang, H., Sherill, C.D., Chow, E.: Techniques for high-performance construction of Fock matrices. Journal of Chemical Physics 152(2), 024122 (2020). https://doi.org/10.1063/1.5129452

Ishimura, K., Kuramoto, K., Ikuta, Y., Hyodo, S.A.: MPI/OpenMP Hybrid Parallel Algorithm for Hartree-Fock Calculations. Journal of Chemical Theory and Computation 6(4), 1075–1080 (2010). https://doi.org/10.1021/ct100083w

Ismagilov, T., Mukosey, A., Smirnov, F., et al.: Towards performance analysis of GPU-aware MPI over Angara interconnect. International Journal of High Performance Computing Applications 40(2), 240–253 (2026). https://doi.org/10.1177/10943420251411961

Kashpurovich, I.V., Oleynichenko, A.V., Stegailov, V.V.: Achieving the maximum performance of the resolution of the identity approximation in the Hartree-Fock method. In: Sokolinsky, L., Zymbler, M. (eds.) Parallel Computational Technologies. Communications in Computer and Information Science, vol. 2891, pp. 239–263. Springer (2026)

Kashpurovich, I.V., Oleynichenko, A.V., Stegailov, V.V.: Development of strategies for parallel implementation of the Hartree-Fock theory in resolution-of-the-identity approximation. Russian Journal of Physical Chemistry A 100(5), 1013–1036 (2026)

Kashpurovich, I.V., Oleynichenko, A.V., Stegailov, V.V.: NUMA-aware OpenMP algorithm for three-center electron repulsion integrals. In: Voevodin, V., Antonov, A., Nikitenko, D. (eds.) Supercomputing. Lecture Notes in Computer Science, vol. 16196, pp. 333–350. Springer, Cham (2026). https://doi.org/10.1007/978-3-032-13127-0

Laikov, D.N.: Fast evaluation of density functional exchange-correlation terms using the expansion of the electron density in auxiliary basis sets. Chemical Physics Letters 281(1–3), 151–156 (1997). https://doi.org/10.1016/s0009-2614(97)01206-2

Lankin, A.V., Norman, G.E.: Introduction to quantum mechanics of living matter. Russian Journal of Physical Chemistry A 99(6), 1416–1445 (2025). https://doi.org/10.1134/s0036024425700785

Le, H.A., Shiozaki, T.: Occupied-orbital fast multipole method for efficient exact exchange evaluation. Journal of Chemical Theory and Computation 14(3), 1228–1234 (2018). https://doi.org/10.1021/acs.jctc.7b00880

Manzer, S., Horn, P.R., Mardirossian, N., Head-Gordon, M.: Fast, accurate evaluation of exact exchange: the occ-RI-K algorithm. Journal of Chemical Physics 143(2), 024113 (2015). https://doi.org/10.1063/1.4923369

Marek, A., Blum, V., Johanni, R., et al.: The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science. Journal of Physics: Condensed Matter 26(21), 213201 (2014). https://doi.org/10.1088/0953-8984/26/21/213201

Mejía-Rodríguez, D., Köster, A.M.: Robust and efficient variational fitting of Fock exchange. Journal of Chemical Physics 141(12), 124114 (2014). https://doi.org/10.1063/1.4896199

Merlot, P., Kjærgaard, T., Helgaker, T., et al.: Attractive electron–electron interactions within robust local fitting approximations. Journal of Computational Chemistry 34(17), 1486–1496 (2013). https://doi.org/10.1002/jcc.23284

Mironov, V.A., Alexeev, Y., Keipert, K., et al.: An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017. vol. 31, pp. 1–12. ACM (2017). https://doi.org/10.1145/3126908.3126956

Neese, F.: An improvement of the resolution of the identity approximation for the formation of the Coulomb matrix. Journal of Computational Chemistry 24(14), 1740–1747 (2003). https://doi.org/10.1002/jcc.10318

Oleynichenko, A.V., Zaitsevskii, A., Mosyagin, N.S., et al.: LIBGRPP: A library for the evaluation of molecular integrals of the generalized relativistic pseudopotential operator over Gaussian functions. Symmetry 15(1), 197 (2022). https://doi.org/10.3390/sym15010197

Reine, S., Tellgren, E., Krapp, A., et al.: Variational and robust density fitting of four-center two-electron integrals in local metrics. Journal of Chemical Physics 129(10), 104101 (2008). https://doi.org/10.1063/1.2956507

Shiozaki, T.: BAGEL: Brilliantly Advanced General Electronic-structure Library. Wiley Interdisciplinary Reviews: Computational Molecular Science 8(1), e1331 (2017). https://doi.org/10.1002/wcms.1331

Stegailov, V., Dlinnova, E., Ismagilov, T., et al.: Angara interconnect makes GPU-based Desmos supercomputer an efficient tool for molecular dynamics calculations. International Journal of High Performance Computing Applications 33(3), 507–521 (2019). https://doi.org/10.1177/1094342019826667

Stegailov, V., Smirnov, G., Vecher, V.: VASP hits the memory wall: Processors efficiency comparison. Concurrency and Computation: Practice and Experience 31(19), e5136 (2019). https://doi.org/10.1002/cpe.5136

Stocks, R., Palethorpe, E., Barca, J.: Multi-GPU RI-HF energies and analytic gradients – toward high-throughput ab initio molecular dynamics. Journal of Chemical Theory and Computation 20(17), 7503–7515 (2024). https://doi.org/10.1021/acs.jctc.4c00877

Sun, Q.: Libcint: An efficient general integral library for Gaussian basis functions. Journal of Computational Chemistry 36(22), 1664–1671 (2015). https://doi.org/10.1002/jcc.23981

Sun, Q.: Efficient Hartree-Fock exchange algorithm with Coulomb range separation and long-range density fitting. Journal of Chemical Physics 159(22), 224101 (2023). https://doi.org/10.1063/5.0178266

Sun, Q., Berkelbach, T.C., Blunt, N.S., et al.: PySCF: the Python-based simulations of chemistry framework. Wiley Interdisciplinary Reviews: Computational Molecular Science 8(1), e1340 (2017). https://doi.org/10.1002/wcms.134

Tugov, A.: Dedicated servers based on Intel Xeon E5v3 processors. https://selectel.ru/blog/vydelennye-servery-na-baze-processorov-intel-xeon-e5v3/ (2014), accessed: 2026-02-15

Vahtras, O., Almlöf, J., Feyereisen, M.: Integral approximations for LCAO-SCF calculations. Chemical Physics Letters 213(5–6), 514–518 (1993). https://doi.org/10.1016/0009-2614(93)89151-7

Valeev, E.F., Shiozaki, T.: Comment on “A tight distance-dependent estimator for screening three-center Coulomb integrals over Gaussian basis functions” [Journal of Chemical Physics 142, 154106 (2015)]. Journal of Chemical Physics 153(9), 097101 (2020). https://doi.org/10.1063/5.0020567

Weigend, F.: A fully direct RI-HF algorithm: Implementation, optimised auxiliary basis sets, demonstration of accuracy and efficiency. Physical Chemistry Chemical Physics 4(18), 4285–4291 (2002). https://doi.org/10.1039/b204199p

Weigend, F., Ahlrichs, R.: Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Physical Chemistry Chemical Physics 7(18), 3297–3305 (2005). https://doi.org/10.1039/B508541A

Weigend, F., Häser, M., Patzelt, H., Ahlrichs, R.: RI-MP2: optimized auxiliary basis sets and demonstration of efficiency. Chemical Physics Letters 294(1–3), 143–152 (1998). https://doi.org/10.1016/s0009-2614(98)00862-8

Windom, Z.W., Bartlett, R.J.: On the iterative diagonalization of matrices in quantum chemistry: reconciling preconditioner design with Brillouin-Wigner perturbation theory. Journal of Chemical Physics 158(13), 134107 (2023). https://doi.org/10.1063/5.0139295

Wu, X., Sun, Q., Pu, Z., et al.: Enhancing GPU-Acceleration in the Python-Based Simulations of Chemistry Frameworks. Wiley Interdisciplinary Reviews: Computational Molecular Science 15(2), e70008 (2025). https://doi.org/10.1002/wcms.70008

Xing, L., Patel, A., Chow, E.: A new scalable parallel algorithm for Fock matrix construction. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium. pp. 902–914 (2014). https://doi.org/10.1109/IPDPS.2014.97

Xu, P., Yang, K.: Balanced Memory Configurations with 5th Generation AMD EPYC Processors. https://lenovopress.lenovo.com/lp2283-balanced-memory-configurations-with-5th-generation-amd-epyc-processors (2025), accessed: 2026-02-15

Zhang, L., Silva, D.A., Yan, Y., Huang, X.: Force field development for cofactors in the photosystem II. Journal of Computational Chemistry 33(25), 1969–1980 (2012). https://doi.org/10.1002/jcc.23016

Downloads

Published

2026-04-27

How to Cite

Kashpurovich, I. V., Oleynichenko, A. V., & Stegailov, V. V. (2026). MPI+OpenMP Implementation of Resolution-of-the-Identity Hartree–Fock Method Exploiting Permutational Symmetry of Three-Center Electron Repulsion Integrals. Supercomputing Frontiers and Innovations, 13(1), 52–73. https://doi.org/10.14529/jsfi260105

Most read articles by the same author(s)