Perspectives on Supercomputing and Artificial Intelligence Applications in Drug Discovery


  • Jun Xu Wuyi University Sun Yat-Sen University
  • Jiming Ye



This review starts with outlining how science and technology evaluated from last century into high throughput science and technology in modern era due to the Nobel-Prize-level inventions of combinatorial chemistry, polymerase chain reaction, and high-throughput screening. The evolution results in big data accumulated in life sciences and the fields of drug discovery. The big data demands for supercomputing in biology and medicine, although the computing complexity is still a grand challenge for sophisticated biosystems in drug design in this supercomputing era. In order to resolve the real-world issues, artificial intelligence algorithms (specifically machine learning approaches) were introduced, and have demonstrated the power in discovering structure-activity relations hidden in big biochemical data. Particularly, this review summarizes on how people modernize the conventional machine learning algorithms by combing non-numeric pattern recognition and deep learning algorithms, and successfully resolved drug design and high throughput screening issues. The review ends with the perspectives on computational opportunities and challenges in drug discovery by introducing new drug design principles and modeling the process of packing DNA with histones in micrometer scale space, a n example of how a macrocosm object gets into microcosm world.


Aprahamian, I.: The future of molecular machines. ACS Central Science 6(3), 347–358 (2020), DOI: 10.1021/acscentsci.0c00064

Arús-Pous, J., Johansson, S.V., Prykhodko, O., et al.: Randomized smiles strings improve the quality of molecular generative models. Journal of Cheminformatics 11(1), 71 (2019), DOI: 10.1186/s13321-019-0393-0

Arús-Pous, J., Patronov, A., Bjerrum, E.J., et al.: SMILES-based deep generative scaffold decorator for de-novo drug design. Journal of Cheminformatics 12(1), 38 (2020), DOI: 10.1186/s13321-020-00441-8

Attar, N., Campos, O.A., Vogelauer, M., et al.: The histone H3-H4 tetramer is a copper reductase enzyme. Science 369(6499), 59–64 (2020), DOI: 10.1126/science.aba8740

Baichoo, S., Ouzounis, C.A.: Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment. Bio Systems 156-157, 72–85 (2017), DOI: 10.1016/j.biosystems.2017.03.003

Banegas-Luna, A.J., Cern-Carrasco, J.P., Pérez-Sánchez, H.: A review of ligand-based virtual screening web tools and screening algorithms in large molecular databases in the age of big data. Future Medicinal Chemistry 10(22), 2641–2658 (2018), DOI: 10.4155/fmc-2018-0076

Bemis, G.W., Murcko, M.A.: The properties of known drugs. 1. Molecular frameworks. Journal of Medicinal Chemistry 39(15), 2887–2893 (1996), DOI: 10.1021/jm9602928

Carhart, R.E., Smith, D.H., Venkataraghavan, R.: Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences 25(2), 64–73 (1985), DOI: 10.1021/ci00046a002

Chen, H., Engkvist, O., Wang, Y., et al.: The rise of deep learning in drug discovery. Drug Discovery Today 23(6), 1241–1250 (2018), DOI: 10.1016/j.drudis.2018.01.039

Chen, J., Cheong, H.H., Siu, S.W.I.: Bestox: A convolutional neural network regression model based on binary-encoded SMILES for acute oral toxicity prediction of chemical compounds. In: Martín-Vide, C., Vega-Rodríguez, M.A., Wheeler, T. (eds.) Algorithms for Computational Biology. pp. 155–166. Springer International Publishing, Cham (2020), DOI: 10.1007/978-3-030-42266-0 12

Dans, P.D., Walther, J., Gómez, H., Orozco, M.: Multiscale simulation of DNA. Current Opinion in Structural Biology 37, 29–45 (2016), DOI: 10.1016/

Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical compounds. In: Agrawal, R., Stolorz, P.E., Piatetsky-Shapiro, G. (eds.) Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, KDD-98, 27-31 August 1998, New York City, New York, USA. pp. 30–36. AAAI Press (1998)

Durant, J.L., Leland, B.A., Henry, D.R., Nourse, J.G.: Reoptimization of MDL keys for use in drug discovery. Journal of Chemical Information and Computer Sciences 42(6), 1273–1280 (2002), DOI: 10.1021/ci010132r

García-López, V., Chen, F., Nilewski, L.G., et al.: Molecular machines open cell membranes. Nature 548(7669), 567–572 (2017), DOI: 10.1038/nature23657

Ge, H., Wang, Y., Li, C., et al.: Molecular dynamics-based virtual screening: Accelerating the drug discovery process by high-performance computing. Journal of Chemical Information and Modeling 53(10), 2757–2764 (2013), DOI: 10.1021/ci400391s

Hajduk, P.J., Greer, J.: A decade of fragment-based drug design: strategic advances and lessons learned. Nature Reviews Drug Discovery 6(3), 211–219 (2007), DOI: 10.1038/nrd2220

Hirohara, M., Saito, Y., Koda, Y.o.: Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC Bioinformatics 19(19), 526 (2018), DOI: 10.1186/s12859-018-2523-5

Homola, J.: Surface plasmon resonance sensors for detection of chemical and biological species. Chemical Reviews 108(2), 462–493 (2008), DOI: 10.1021/cr068107d

Itoh, H., Tokumoto, K., Kaji, T., et al.: Development of a high-throughput strategy for discovery of potent analogues of antibiotic lysocin E. Nature Communications 10(1), 2992 (2019), DOI: 10.1038/s41467-019-10754-4

Kolb, H.C., Finn, M.G., Sharpless, K.B.: Click chemistry: Diverse chemical function from a few good reactions. Angewandte Chemie International Edition 40(11), 2004–2021 (2001), DOI: 10.1002/1521-3773(20010601)40:11¡2004::AID-ANIE2004¿3.0.CO;2-5

Lancia, F., Ryabchun, A., Katsonis, N.: Life-like motion driven by artificial molecular machines. Nature Reviews Chemistry 3(9), 536–551 (2019), DOI: 10.1038/s41570-019-0122-2

Li, X., Yan, X., Gu, Q., et al.: DeepChemStable: Chemical stability prediction with an attention-based graph convolution network. Journal of Chemical Information and Modeling 59(3), 1044–1049 (2019), DOI: 10.1021/acs.jcim.8b00672

Li, Y., Wang, L., Liu, Z., et al.: Predicting selective liver X receptor beta agonists using multiple machine learning methods. Mol Biosyst 11(5), 1241–1250 (2015), DOI: 10.1039/c4mb00718b

Liu, Z., Zheng, M., Yan, X., et al.: ChemStable: a web server for rule-embedded naive Bayesian learning approach to predict compound stability. Journal of Computer-Aided Molecular Design 28(9), 941–950 (2014), DOI: 10.1007/s10822-014-9778-3

Mayr, A., Klambauer, G., Unterthiner, T., et al.: Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 9(24), 5441–5451 (2018), DOI: 10.1039/C8SC00148K

Merrifield, R.B.: Solid phase synthesis (Nobel lecture). Angewandte Chemie International Edition in English 24(10), 799–810 (1985), DOI: 10.1002/anie.198507993

Meyer, B., Peters, T.: NMR spectroscopy techniques for screening and identifying ligand binding to protein receptors. Angewandte Chemie International Edition 42(8), 864–890 (2003), DOI: 10.1002/anie.200390233

Peng, H., Liu, Z., Yan, X., Ren, J., Xu, J.: A de novo substructure generation algorithm for identifying the privileged chemical fragments of liver X receptorbeta agonists. Scientific Reports 7(1), 11121 (2017), DOI: 10.1038/s41598-017-08848-4

Rajarathnam, K., Rosgen, J.: Isothermal titration calorimetry of membrane proteins – progress and challenges. Biochimica et Biophysica Acta (BBA) – Biomembranes 1838(1, Part A), 69–77 (2014), DOI: 10.1016/j.bbamem.2013.05.023

Sahoo, S., Adhikari, C., Kuanar, M., Mishra, B.K.: A short review of the generation of molecular descriptors and their applications in quantitative structure property/activity relationships. Current Computer-Aided Drug Design 2(3), 181–205 (2016), DOI: 10.2174/1573409912666160525112114

Saiki, R.K., Bugawan, T.L., Horn, G.T., et al.: Analysis of enzymatically amplified beta-globin and HLA-DQ alpha DNA with allele-specific oligonucleotide probes. Nature 324(6093), 163–166 (1986), DOI: 10.1038/324163a0

Su, S., Yang, Y., Gan, H., et al.: Predicting the feasibility of copper(i)-catalyzed alkyne–azide cycloaddition reactions using a recurrent neural network with a self-attention mechanism. Journal of Chemical Information and Modeling 60(3), 1165–1174 (2020), DOI: 10.1021/acs.jcim.9b00929

Van Noorden, R., Castelvecchi, D.: World’s tiniest machines win chemistry Nobel. Nature 538(7624), 152–153 (2016), DOI: 10.1038/nature.2016.20734

Walters, W.P.: Virtual chemical libraries. Journal of Medicinal Chemistry 62(3), 1116–1124 (2019), DOI: 10.1021/acs.jmedchem.8b01048

Wang, L., Chen, L., Liu, Z., et al.: Predicting mTOR inhibitors with a classifier using recursive partitioning and naive Bayesian approaches. PLOS ONE 9(5), 1–15 (2014), DOI: 10.1371/journal.pone.0095221

Wang, S., Guo, Y., Wang, Y., et al.: SMILES-BERT: Large scale unsupervised pretraining for molecular property prediction. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. pp. 429–436. BCB ’19, Association for Computing Machinery, New York, NY, USA (2019), DOI: 10.1145/3307339.3342186

Weininger, D.: SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences 28(1), 31–36 (1988), DOI: 10.1021/ci00057a005

Whitfield, J.D., Love, P.J., Aspuru-Guzik, A.: Computational complexity in electronic structure. Phys. Chem. Chem. Phys. 15(2), 397–411 (2013), DOI: 10.1039/C2CP42695A

Xu, J.: GMA: A generic match algorithm for structural homomorphism, isomorphism, and maximal common substructure match and its applications. Journal of Chemical Information and Computer Sciences 36(1), 25–34 (1996), DOI: 10.1021/ci950061u

Xu, J.: A new approach to finding natural chemical structure classes. Journal of Medicinal Chemistry 45(24), 5311–5320 (2002), DOI: 10.1021/jm010520k

Yang, Y., Zheng, S., Su, S., et al.: Syntalinker: automatic fragment linking with deep conditional transformer neural networks. Chem. Sci. 11(31), 8312–8322 (2020), DOI: 10.1039/D0SC03126G

Zhang, L., Marcos, V., Leigh, D.A.: Molecular machines with bio-inspired mechanisms. Proceedings of the National Academy of Sciences 115(38), 9397–9404 (2018), DOI: 10.1073/pnas.1712788115

Zheng, S., Yan, X., Gu, Q., et al.: QBMG: quasi-biogenic molecule generator with deep recurrent neural network. Journal of Cheminformatics 11(1), 5 (2019), DOI: 10.1186/s13321-019-0328-9

Zheng, S., Yan, X., Yang, Y., Xu, J.: Identifying structure–property relationships through SMILES syntax analysis with self-attention mechanism. Journal of Chemical Information and Modeling 59(2), 914–923 (2019), DOI: 10.1021/acs.jcim.8b00803

Zitha-Bovens, E., Maas, P., Wife, D., et al.: Comdecom: predicting the lifetime of screening compounds in DMSO solution. J Biomol Screen 14(5), 557–565 (2009), DOI: 10.1177/1087057109336953




How to Cite

Xu, J., & Ye, J. (2020). Perspectives on Supercomputing and Artificial Intelligence Applications in Drug Discovery. Supercomputing Frontiers and Innovations, 7(3).