Computational Approaches To Identify A Hidden Pharmacological Potential In Large Chemical Libraries


  • Dmitry S. Druzhilovskiy Institute of Biomedical Chemistry (IBMC)
  • Leonid A. Stolbov Institute of Biomedical Chemistry (IBMC)
  • Polina I. Savosina Institute of Biomedical Chemistry (IBMC)
  • Pavel V. Pogodin Institute of Biomedical Chemistry (IBMC)
  • Dmitry A. Filimonov Institute of Biomedical Chemistry (IBMC)
  • Alexander V. Veselovsky Institute of Biomedical Chemistry (IBMC)
  • Karen Stefanisko National Cancer Institute, National Institutes of Health
  • Nadya I. Tarasova National Cancer Institute, National Institutes of Health
  • Marc C. Nicklaus National Cancer Institute, National Institutes of Health
  • Vladimir V. Poroikov Institute of Biomedical Chemistry (IBMC)



To improve the discovery of more effective and less toxic pharmaceutical agents, large virtual repositories of synthesizable molecules have been generated to increase the explored chemical-pharmacological space diversity. Such libraries include billions of structural formulae of drug-like molecules associated with data on synthetic schemes, required building blocks, estimated physical-chemical parameters, etc. Clearly, such repositories are “Big Data”. Thus, to identify the most promising compounds with the required pharmacological properties (hits) among billions of available opportunities, special computational methods are necessary. We have proposed using a combined computational approach, which combines structural similarity assessment, machine learning, and molecular modeling. Our approach has been validated in a project aimed at finding new pharmaceutical agents against HIV/AIDS and associated comorbidities from the Synthetically Accessible Virtual Inventory (SAVI), a 1.75 billion compound database. Potential inhibitors of HIV-1 protease and reverse transcriptase and agonists of toll-like receptors and STING, affecting innate immunity, were computationally identified. The activity of the three synthesized compounds has been confirmed in a cell-based assay. These compounds belong to the chemical classes, in which the agonistic effect on TLR 7/8 had not been previously shown. Synthesis and biological testing of several dozens of compounds with predicted antiretroviral activity are currently taking place at the NCI/NIH. We also carried out virtual screening among one billion substances to find compounds potentially possessing anti-SARS-CoV-2 activity. The selected hits' information has been accepted by the European Initiative “JEDI Grand Challenge against COVID-19” for synthesis and further biological evaluation. The possibilities and limitations of the approach are discussed.


Abagyan, R., Totrov, M., Kuznetsov, D.: A new method for protein modeling and design: applications to docking and structure prediction from the distorted native conformation. J. Comp. Chem. 15(5), 488–506 (1994), DOI: 10.1002/jcc.540150503

Aldrich Market Select (AMS)., accessed: 2020-09-21

Antiviral CAS dataset., accessed: 2020-09-21

Anusevicius, K., Mickevicius, V., Stasevych, M., et al.: Synthesis and chemoinformatics analysis of N-aryl-beta-alanine derivatives. Research on Chemical Intermediates 41(10), 7517–7540 (2015), DOI: 10.1007/s11164-014-1841-0

AutoDock Vina., accessed: 2020-09-21

Bender, A.: How similar are those molecules after all? Use two descriptors and you will have three different answers. Expert Opinion on Drug Discovery 5(12), 1141–1151 (2010), DOI: 10.1517/17460441.2010.517832

Bobrowski, T., Melo-Filho, C., Korn, D., et al.: Learning from history: do not flatten the curve of antiviral research! Drug Discovery Today 25(9), 1604–1613 (2020), DOI: 10.1016/j.drudis.2020.07.008

Bojkova, D., McGreig, J., McLaughlin, K., et al.: SARS-CoV-2 and SARS-CoV differ in their cell tropism and drug sensitivity profiles. bioRxiv (2020), DOI: 10.1101/2020.04.03.024257

Burov, Y., Poroikov, V., Korolchenko, L.: National system for registration and biological testing of chemical compounds: facilities for new drugs search. Bull. Natl. Center for Biologically Active Compounds 1, 4–25 (1990)

ChEMBL database., accessed: 2020-09-21

ChemNavigator., accessed: 2020-09-21

Cherkasov, C., Muratov, E., Fourches, D., et al.: QSAR modeling: where have you been? Where are you going to? Journal of Medicinal Chemistry 57(12), 4977–5010 (2014), DOI: 10.1021/jm4004285

Chou, C., Chien, C., Han, Y., et al.: Thiopurine analogues inhibit papain-like protease of severe acute respiratory syndrome coronavirus. Biochemical Pharmacology 75(8), 1601–1609 (2008), DOI: 10.1016/j.bcp.2008.01.005

Cortellis Drug Discovery Intelligence., accessed: 2020-09-21

Dai, W., Zhang, B., Jiang, X., et al.: Structure-based design of antiviral drug candidates targeting the SASR-CoV-2 main protease. Science 368, 1331–1335 (2020), DOI: 10.1126/science.abb4489

Dearden, J., Kaiser, K.: How not to develop a quantitative structure-activity or structureproperty relationship (QSAR/QSPR). SAR and QSAR in environmental research 20(3-4), 241–266 (2009), DOI: 10.1080/10629360902949567

Dimova, D., Bajorath, J.: Advances in Activity Cliff Research. Molecular informatics 35(5), 181–191 (2016), DOI: 10.1002/minf.201600023

Discord JEDI Chat., accessed: 2020-09-21

Ellinger, B., Bojkova, D., Zaliani, A., Cinatl, J., et al.: Identification of inhibitors of SARSCoV-2 in-vitro cellular toxicity in human (Caco-2) cells using a large scale drug repurposing collection. Research Square Preprint (2020), DOI: 10.21203/

Enamine Ltd., accessed: 2020-09-21

Fernandez-Recio, J., Totrov, M., Skorodumov, C., Abagyan, R.: Optimal docking area: a new method for predicting protein-protein interaction sites. Proteins 58(1), 134–143 (2005), DOI: 10.1002/prot.20285

Filimonov, D., Poroikov, V., Borodina, Y., Gloriozova, T.: Chemical similarity assessment through multilevel neighborhoods of atoms: definition and comparison with the other descriptors. Journal of Chemical Information and Computer Sciences 39(4), 666–670 (1999), DOI: 10.1021/ci980335o

Filimonov, D., Akimov, D., Poroikov, V.: Method of self-consistent regression in analysis of quantitative structure-property relationships of chemical compounds. Pharmaceutical Chemistry Journal 38(1), 21–24 (2004), DOI: 10.1023/B:PHAC.0000027639.17115.5d

Filimonov, D., Poroikov, V., Gloziozova, T., Lagunin, A.: PASS program package, Certificate of Russian State Patent Agency, No. 2006613275 of 15.09.2006

Filimonov, D., Zakharov, A., Lagunin, A., Poroikov V.: QNA based “Star Track” QSAR approach. SAR and QSAR in environmental research 20(7-8), 679–709 (2009), DOI: 10.1080/10629360903438370

Filimonov, D., Druzhilovskiy, D., Lagunin, F., et al.: Computer-aided prediction of biological activity spectra for chemical compounds: opportunities and limitations. Biomedical Chemistry: Research and Methods 1(1), e00004 (2018), DOI: 10.18097/bmcrm00004

Fourches, D., Muratov, E., Tropsha, A.: Curation of chemogenomics data. Nature Chemical Biology 11(8), 535 (2015), DOI: 10.1038/nchembio.1881

Geronikaki, A., Druzhilovsky, D., Zakharov, A., Poroikov, V.: Computer-aided predictions for medicinal chemistry via Internet. SAR and QSAR in environmental research 19(1-2), 27–38 (2008), DOI: 10.1080/10629360701843649

Ghosh, A., Takayama, J., Aubin, Y., et al.: Structure-based design, synthesis, and biological evaluation of a series of novel and reversible inhibitors for the severe acute respiratory syndrome-coronavirus papain-like protease. Journal of Medicinal Chemistry 52(16), 5228–5240 (2009), DOI: 10.1021/jm900611t

Gramatica, P.: On the development and validation of QSAR models. Methods in Molecular Biology 930, 499–526 (2013), DOI: 10.1007/978-1-62703-059-5 21

InterBioScreen (IBS) Natural Compounds Set., accessed: 2020-09-21

Jaccard, P.: Distribution de la flore alpine dans le Bassin des Dranses et dans quelques regions voisines. Bulletin de la Societe Vaudoise des Sciences Naturelles 37(140), 241–272 (1901), DOI: 10.5169/seals-266440

JEDI Grand Challenge Against Covid-19., accessed: 2020-09-21

Jin, Z., Du, X., Xu, Y., et al.: Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature 582, 289–293 (2020), DOI: 10.1038/s41586-020-2223-y

Kim, D., Seo, K., Curtis-Long, M., et al.: Phenolic phytochemical displaying SARS-CoV papain-like protease inhibition from the seeds of Psoralea corylifolia. Journal of Enzyme Inhibition and Medicinal Chemistry 29(1), 59–63 (2014), DOI: 10.3109/14756366.2012.753591

Kubinyi, H.: Chemical similarity and biological activities. Journal of the Brazilian Chemical Society 13(6), 717–726 (2002), DOI: 10.1590/S0103-50532002000600002

Lagunin, A., Romanova, M., Zadorozhny, A., et al.: Comparison of Quantitative and Qualitative (Q)SAR Models Created for the Prediction of Ki and IC50 Values of Antitarget Inhibitors. Frontiers in Pharmacology 9, 1138 (2018), DOI: 10.3389/fphar.2018.01136

Lhasa Ltd., accessed: 2020-09-21

Lushchekina, S., Makhaeva, G., Novichkova, D., et al.: Supercomputer modeling of dualsite acetylcholinesterase (AChE) inhibition. Supercomputing Frontiers and Innovations 5(4), 89–97 (2018), DOI: 10.14529/jsfi1804

Ma, C., Sacco, M., Hurst, B., et al.: Boceprevir, GC-376, and calpain inhibitors II, XII inhibit SARS-CoV-2 viral replication by targeting the viral main protease. Cell Research 30, 678–692 (2020), DOI: 10.1038/s41422-020-0356-z

Mansouri, K., Kleinstreuer, N., Abdelaziz, A., et al.: CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity. Environmental Health Perspectives 128(2), 27002 (2020), DOI: 10.1289/EHP5580

Maslova, V., Reshetnikov, R., Bezugolov, V., et al.: Supercomputer Simulations of Dopamine-Derived Ligands Complexed with Cyclooxygenases. Supercomputing Frontiers and Innovations 5(4), 98–102 (2018), DOI: 10.14529/jsfi1804

Mauri, A., Ballabio, D., Todeschini, R., Consonni, V.: Mixtures, metabolites, ionic liquids: a new measure to evaluate similarity between complex chemical systems. Journal of Cheminformatics 8, 49 (2016), DOI: 10.1186/s13321-016-0159-x

Mervin, L., Afzal, A., Drakakis, G., et al.: Target prediction utilising negative bioactivity data covering large chemical space. Journal of Cheminformatics 7, 51 (2015), DOI: 10.1186/s13321-015-0098-y

Muratov, E., Bajorath, J., Sheridan, R., et al.: QSAR without borders. Chemical Society reviews 49(11), 3525–3564 (2020), DOI: 10.1039/d0cs00098a

Murtazalieva, K., Druzhilovskiy, D., Goel, R., et al.: How good are publicly available web services that predict bioactivity profiles for drug repurposing? SAR and QSAR in environmental research 28(10), 843–862 (2017), DOI: 10.1080/1062936X.2017.1399448

Neves, M., Totrov, M., Abagyan, R.: Docking and scoring with ICM: the benchmarking results and strategies for improvement. Journal of Computer-Aided Molecular Design 26(6), 675–686 (2012), DOI: 10.1007/s10822-012-9547-0

National Institute of Allergy and Infectious Diseases (NIAID) HIV/OI/TB database., accessed: 2020-09-21

Patel, H., Ihlenfeldt, W., Judson, P., et al.: Synthetically Accessible Virtual Inventory (SAVI). ChemRxiv Preprint (2020), DOI: 10.26434/chemrxiv.12185559.v1

Poroikov, V., Filimonov, D., Borodina, Y., et al.: Robustness of biological activity spectra predicting by computer program PASS for non-congeneric sets of chemical compounds. Journal of Chemical Information and Computer Sciences 40(6), 1349–1355 (2000), DOI: 10.1021/ci000383k

Poroikov, V.: Computer-aided drug design: from discovery of novel pharmaceutical agents to systems pharmacology. Biochemistry (Moscow), Supplement Series B: Biomedical Chemistry 14(3), 216–227 (2020), DOI: 10.1134/S1990750820030117

PostERA activity data., accessed: 2020-09-21

Protein Data Bank (PDB)., accessed: 2020-09-21

Pruijssers, A., George, A., Schäfer, A., et al.: Remdesivir potently inhibits SARS-CoV-2 in human lung cells and chimeric SARS-CoV expressing the SARS-CoV-2 RNA polymerase in mice. bioRxiv (2020), DOI: 10.1101/2020.04.27.064279

PubChem., accessed: 2020-09-21

Ratia, K., Pegan, S., Takayama, J., et al.: HA noncovalent class of papain-like protease/deubiquitinase inhibitors blocks SARS virus replication. Proceedings of the National Academy of Sciences 105(42), 16119–16124 (2008), DOI: 10.1073/pnas.0805240105

REAL database., accessed: 2020-09-21

Riva, L., Yuan, S., Yin, X., et al.: A large-scale drug repositioning survey for SARS-CoV-2 antivirals. bioRxiv (2020), DOI: 10.1101/2020.04.16.044016

SAVI: Synthetically Accessible Virtual Inventory., accessed: 2020-09-21

SAVI-2020 dataset. DOI: 10.35115/37N9-5738

Savosina, P., Stolbov, L., Druzhilovskiy, D., et al.: Discovering new antiretroviral compounds in “Big Data” chemical space of the SAVI library. Biomeditsinskaya Khimiya 65(2), 73–79 (2019), DOI: 10.18097/PBMC20196502073

Sheahan, T., Sims, A., Zhou, S., et al.: An orally bioavailable broad-spectrum antiviral inhibits SARS-CoV-2 and multiple endemic, epidemic and bat coronavirus. bioRxiv (2020), DOI: 10.1101/2020.03.19.997890

Sheridan, R., Kearsley, S.: Why do we need so many chemical similarity search methods? Drug Discovery Today 7(17), 903–911 (2002), DOI: 10.1016/s1359-6446(02)02411-x

Sielaff, F., Böttcher-Friebertshäuser, E., Meyer, D., et al.: Development of substrate analogue inhibitors for the human airway trypsin-like protease HAT. Bioorganic & Medicinal Chemistry Letters 21(16), 4860–4864 (2011), DOI: 10.1016/j.bmcl.2011.06.033

Stanford Coronavirus Antiviral Research Database., accessed: 2020-09-21

Stolbov, L., Druzhilovskiy, D., Filimonov, D., et al.: (Q)SAR models of HIV-1 proteins inhibition by drug-like compounds. Molecules 25(1), 87 (2020), DOI: 10.3390/molecules25010087

Sulimov, A., Kutov, D., Sulimov, V.: Supercomputer docking. Supercomputing Frontiers and Innovations 6(3), 25–50 (2019), DOI: 10.14529/jsfi190302

SWEETLEAD: A cheminformatics database of medicines, drugs, and herbal isolates., accessed: 2020-09-21

Tanimoto, T.: An Elementary Mathematical theory of Classification and Prediction. International Business Machines Corporation (1958)

Wermuth, C., Aldous, D., Raboisson, P., et al.: The Practice of Medicinal Chemistry. Fourth edition. Academic Press 902 (2015), DOI: 10.1016/B978-0-12-374194-3.X0001-7

Todeschini, R., Consonni, V.: Handbook of Molecular Descriptors. Wiley-VCH (2008), DOI: 10.1002/9783527613106

Tropsha, A.: Best practices for QSAR model development, validation, and exploitation. Molecular Informatics 29(6-7), 476–488 (2010), DOI: 10.1002/minf.201000061

UCSF Dock., accessed: 2020-09-21

Vuong, W., Khan, M., Fischer, C., et al.: Feline coronavirus drug inhibits the main protease of SARS-CoV-2 and blocks virus replication. bioRxiv (2020), DOI: 10.1101/2020.05.03.073080

Wang, M., Cao, R., Zhang, et al.: Remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus (2019-nCoV) in vitro. Cell Research 30(3), 269–271 (2020), DOI: 10.1038/s41422-020-0282-0

Wermuth, C.: Similarity in drugs: reflections on analogue design. Drug Discovery Today 11(7-8), 348–354 (2006), DOI: 10.1016/j.drudis.2006.02.006

World Wide Approved Drugs (WWAD)., accessed: 2020-09-21

Zakharov, A., Filimonov, D., Lagunin, A., Poroikov, V.: GUSAR (General Unrestricted Structure-Activity Relationships) program package, Certificate of Russian State Patent Agency, No. 2006613591 of 16.10.2006

Zakharov, A., Peach, M., Sitzmann, M., Nicklaus, M.: A new approach to radial basis function approximation and its application to QSAR. Journal of Chemical Information and Modeling 54(3), 713–719 (2014), DOI: 10.1021/ci400704f

ZINC library., accessed: 2020-09-21




How to Cite

Druzhilovskiy, D. S., Stolbov, L. A., Savosina, P. I., Pogodin, P. V., Filimonov, D. A., Veselovsky, A. V., Stefanisko, K., Tarasova, N. I., Nicklaus, M. C., & Poroikov, V. V. (2020). Computational Approaches To Identify A Hidden Pharmacological Potential In Large Chemical Libraries. Supercomputing Frontiers and Innovations, 7(3).