Computational Characterization of the Substrate Activation in the Active Site of SARS-CoV-2 Main Protease

Molecular dynamics simulations with the QM(DFT)/MM potentials are utilized to discriminate between reactive and nonreactive complexes of the SARS-CoV-2 main protease and its substrates. Classification of frames along the molecular dynamic trajectories is utilized by analysis of the 2D maps of the Laplacian of electron density. Those are calculated in the plane formed by the carbonyl group of the substrate and a nucleophilic sulfur atom of the cysteine residue that initiates enzymatic reaction. Utilization of the GPU-based DFT code allows fast and accurate simulations with the hybrid functional PBE0 and double-zeta basis set. Exclusion of the polarization functions accelerates the calculations 2-fold, however this does not describe the substrate activation. Larger basis set with d-functions on heavy atoms and p-functions on hydrogen atoms enables to disclose equilibrium between the reactive and nonreactive species along the MD trajectory. The suggested approach can be utilized to choose covalent inhibitors that will readily interact with the catalytic residue of the selected enzyme.


Introduction
The combined quantum mechanics/molecular mechanics (QM/MM) approach is a proper tool to study chemical reactions in the active sites of enzymes. It allows considering chemical reaction at the QM level taking into account the electrostatic field and steric constraints coming from the rest of the protein and polar water solvent. Reliable results can be obtained only if a relatively high level of theory for the QM subsystem is applied. It was demonstrated that utilization of the density functional theory (DFT) with the hybrid functionals [23,24] is suitable for such purpose, whereas simplified semiemprical methods like, for example, DFTB fails [25]. DFT-based QM/MM calculations require utilization of supercomputer facilities and many CPUs. Recently, considerable efforts have been performed to develop a GPU-based DFT code [8,9,11,12,18,19]. It was implemented in the TeraChem program package [17]. Later, the interface for the QM/MM calculations was proposed [10]. It is based on the powerful NAMD program [16] and the corresponding supporting features can be utilized in QM/MM simulations. This implementation demonstrates dramatic speedup: the same calculations of energy gradient for the system with about 1000 basis functions can be performed on a GPU in less then 2 minutes or on 100 CPUs in about 5 minutes. This allow not only geometry optimization requiring hundreds of gradient calculations for each stationary point on the potential energy surface, but thousands of calculations to obtain a molecular dynamics trajectory.
Computational studies of molecular mechanisms of enzymatic reactions are important for biomedical applications. Analysis of the substrate specificity of proteases can be utilized to construct covalent inhibitors. Those form covalent bonds with the catalytic residues and com-pletely abolish enzymatic activity. This is of great importance for the main protease from the SARS-CoV-2 as it is responsible for partitioning of a synthesized viral polypeptides to the specific fragments that forms a set of proteins required for the virus replication. Elimination of enzymatic activity of the main protease leads to the virus death.
It was recently demonstrated for the main protease from the SARS-CoV-2 that the analysis of QM/MM MD trajectories of the enzyme-substrate complexes can explain the substrate specificity of this enzyme [6]. Also, it was shown that the proper theory level is crucial to obtain valuable results. Utilization of the hybrid DFT functional PBE0 [1] allows one to discriminate reactive and nonreactive enzyme-substrate complexes along MD trajectory, whereas GGA-type functional PBE [14,15] that lacks exact HF exchange fails.
In this communication, we address an important question of the theory level required for the proper description of the dynamic behaviour of the enzyme-substrate complex taking a topical example of the SARS-CoV-2 main protease and its substrate. We test the influence of the basis set, namely presence of polarization functions (d-functions on heavy atoms and p-functions on hydrogen atoms) in QM calculations, as their exclusion speeds up calculations about 2-fold.

Methodology
We borrowed the model system of the SARS-CoV-2 main protease with its oligopeptide substrate from the recent study [6]. The main protease exists in the dimeric form and the active site of one of monomers is considered (Fig. 1). The substrate is involved in three key interatomic interactions with the active site: two hydrogen bonds with the oxyanion hole of the enzyme formed by the NH groups of the backbones of glycine G143 and cysteine C145; specific interaction formed prior the nucleophilic attack between the carbonyl carbon atom of the substrate and a sulfur atom of C145 (Fig. 1). Another important quantity considered in this study is the interatomic distance between the hydrogen atom of the SH group of C145 and nitrogen atom of histidine H41 (Fig. 1). This distance characterizes whether the proton forms a covalent bond with the nitrogen or sulfur atom along a MD trajectory. Details of the molecular dynamic protocol can be found in ref [6]. Shortly, the QM/MM MD simulations were performed in NPT ensemble at T = 300 K and p = 1 atm. The integration time step was set to 1 ps and the trajectory length was 15 ps for each system. The CHARMM36 [2,3] and CGenFF [20][21][22] force fields parameters were utilized for the enzyme and substrate molecules and TIP3P [5] parameters for water molecules. The QM subsystem was treated with the hybrid PBE0 [1] DFT functional with empirical dispersion correction D3 [4] with the 6-31G(d,p) or 6-31G basis sets. The QM/MM MD simulations were performed with TeraChem [17] and NAMD [16] programs and their interface proposed in ref [10]. The electron density analysis, namely the Laplacian of electron density, ∇ 2 ρ(r), calculations were performed in the Multiwfn program [7].

Results and Discussion
The first step of chemical reaction in the active site of cysteine protease is the nucleophilic attack accompanied with the proton transfer from the SH group of the catalytic cysteine residue (C145) to a histidine residue (H41) [13] (Fig. 1). The oxyanion hole formed by the NH fragments of the backbones of C145 and G143 in case of SARS-CoV-2 is supposed to be responsible for the substrate activation [13] (Fig. 1). Therefore, we start with the analysis of these four interatomic distances distributions to compare MD trajectories simulated at the QM(PBE0-  N(H41)) distributions are notably different. In the QM(PBE0-D3/6-31G(d,p))/MM MD trajectory the covalent bond is always formed between a proton and a sulfur atom of C145. In the QM(PBE0-D3/6-31G)/MM MD trajectory, the proton is usually located closely to the nitrogen atom of H41 and the covalent bond is predominantly formed between them.
We analyzed the origin of considerable differences in the dynamic behaviour of model systems treated at different levels of theory. First, we selected the representative MD frame from the QM(PBE0-D3/6-31G(d/p))/MM corresponding to the reactive complex (Fig. 3). The Laplacian of electron density, ∇ 2 ρ(r), depicts areas of electron density concentration (∇ 2 ρ(r) < 0) and depletion (∇ 2 ρ(r) > 0), i.e. electrophilic and nucleophilic sites, respectively. If the electron  density is calculated at the PBE0-D3/6-31G(d,p) level with polarization functions on all atoms, the electrophilic site on the carbonyl carbon atom of the substrate is observed (Fig. 3a). It is the region of positive values of the ∇ 2 ρ(r) in the direction of nucleophilic attack between the C and S atoms. If the Laplacian of electron density is recalculated at the same geometry Computational Characterization of the Substrate Activation in the Active Site of... configuration without polarization functions, no substrate activation is observed (Fig. 3b). The nucleophilic attack is facilitated by the electron lone pair of the sulfur atom that is observed on both ∇ 2 ρ(r) maps. It is due to the fact that the substrate activation is a more refined electron density feature than the lone pair electron concentration, and polarization functions are required for its description. To further support this idea, we calculated electron density difference, ∆ρ, maps in the same plane (Fig. 4). The difference is mostly pronounced around the carbonyl carbon atom. Electron density is considerably higher in the direction of the nucleophilic attack if calculated without polarization functions. This explains the absence of the electron density depletion region around C atom on the ∇ 2 ρ(r) maps calculated at the PBE0-D3/6-31G level.  We also analysed ∇ 2 ρ(r) maps at several MD frames from the trajectory calculated at the QM(PBE0-D3/6-31G)/MM level. We selected a frame with the short distances of the nucleophilic attack and hydrogen bonds with the oxyanion hole and a frame where the proton from the SH group is transferred to the histidine residue (in this case the nucleophile is a negatively charged sulfur). In both cases Laplacian of electron density maps demonstrates no substrate activation.
Combining these computational experiments together, we can conclude that the applied computational strategy is useful in predicting substrate specificity of enzymes, if a proper level of theory is applied.

Conclusion
Novel DFT code implemented in the TeraChem [17] and optimized for the GPU discloses new opportunities for the QM(DFT)/MM molecular dynamics simulations. It allows fast and accurate simulations of interactions occurring between the substrate and enzyme in its active site. We take the main protease from the SARS-CoV-2 and its substrate to analyze the influence of the basis set on the quality of interaction description. Despite the 2-fold speedup of the calculations, elimination of polarization functions results in the wrong description of the substrate activation and cannot be recommended for such analysis. Utilization of the 6-31G(d,p) basis set enables to disclose equilibrium between the reactive and nonreactive species along the MD trajectory. We proposed the methodology based on the QM/MM MD simulations followed by the electron density analysis that allow one to determine substrate activation in the active site of a protease and we demonstrated the importance of utilization of a proper QM theory level for reliable avaluation of this quantity. This approach can be utilized to choose covalent inhibitors that will readily interact with the catalytic residue of the selected enzyme.