Using High Performance Computing to Create and Freely Distribute the South Asian Genomic Database, Necessary for Precision Medicine in this Population


  • Asmi H. Shah Global Gene Corporation Pte Ltd
  • Jonathan D. Picker Global Gene Corporation Pte Ltd
  • Saumya S. Jamuar Global Gene Corporation Pte Ltd



Precision medicine is an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person”. Efforts to implement precision medicine have gained traction in recent years due to significantly increased understanding of the role of genetic variations in human disease over the past decade. However, delivery of precision medicine requires robust population specific reference genome datasets for full appreciation of existing natural variation. The majority of publicly available genomic databases are primarily derived from Caucasian populations and do not fully address the diversity of Asian populations. In an effort to address this problem, we have aggregated and built a genomic database, ggcINDIA, specifically for South Asian populations. In collaboration with Global Alliance for Genomics and Health (GA4GH), we have made this database publicly available to the community through the GA4GH's Beacon project. ggcINDIA represents the first Beacon for South Asian populations. As more data is generated and aggregated, the ggcINDIA beacon will provide the precise genomic data that is critical to the delivery of precision medicine within South Asia.


Adzhubei, I., Jordan, D.M., Sunyaev, S.R.: Predicting functional effect of human missense mutations using polyphen-2. Current protocols in human genetics pp. 7–20 (2013), DOI:10.1002/0471142905.hg0720s76]

Ascierto, P.A., Kirkwood, J.M., Grob, J.J., Simeone, E., Grimaldi, A.M., Maio, M., Palmieri, G., Testori, A., Marincola, F.M., Mozzillo, N.: The role of braf v600 mutation in melanoma. Journal of translational medicine 10(1), 85 (2012), DOI:10.1186/1479-5876-10-85

Auwera, G.A., Carneiro, M.O., Hartl, C., Poplin, R., del Angel, G., Levy-Moonshine, A., Jordan, T., Shakir, K., Roazen, D., Thibault, J., et al.: From fastq data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Current protocols in bioinformatics pp. 11–10 (2013), DOI:10.1002/0471250953.bi1110s43

Chambers, J.C., Abbott, J., Zhang, W., Turro, E., Scott, W.R., Tan, S.T., Afzal, U., Afaq, S., Loh, M., Lehne, B., et al.: The south asian genome. PLoS One 9(8), e102645 (2014), DOI:10.1371/journal.pone.0102645

Collins, F.S., Varmus, H.: A new initiative on precision medicine. New England Journal of Medicine 372(9), 793–795 (2015), DOI:10.1056/NEJMp1500523

Consortium, .G.P., et al.: A global reference for human genetic variation. Nature 526(7571), 68–74 (2015), DOI:10.1038/nature15393

Cotton, R., Horaitis, O.: Human genome variation society. eLS (2006), DOI:10.1038/npg.els.0005964

DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., Del Angel, G., Rivas, M.A., Hanna, M., et al.: A framework for variation discovery and genotyping using next-generation dna sequencing data. Nature genetics 43(5), 491–498 (2011), DOI:10.1038/ng.806

Freed, D.N., Aldana, R., Weber, J.A., Edwards, J.S.: The sentieon genomics tools-a fast and accurate solution to variant calling from next-generation sequence data. bioRxiv p. 115717 (2017), DOI:10.1101/115717

GlobalAllianceForGenomics&Health: Beacon network., accessed: 2017-05-09

Hamidizadeh, L., Abadi, B., Hosseini, R.H., Baigi, B., Ali, M., Dastsooz, H., Nejhad, A.K., Fardaei, M.: Impact of kif6 polymorphism rs20455 on coronary heart disease risk and effectiveness of statin therapy in 100 patients from southern iran. Archives of Iranian Medicine (AIM) 18(10) (2015)

Kosseim, P., Dove, E.S., Baggaley, C., Meslin, E.M., Cate, F.H., Kaye, J., Harris, J.R., Knoppers, B.M.: Building a data sharing model for global genomic research. Genome biology 15(8), 430 (2014), DOI:10.1186/s13059-014-0430-2

Lek, M., Karczewski, K.J., Minikel, E.V., Samocha, K.E., Banks, E., Fennell, T., ODonnell-Luria, A.H., Ware, J.S., Hill, A.J., Cummings, B.B., et al.: Analysis of protein-coding genetic variation in 60,706 humans. Nature 536(7616), 285–291 (2016), DOI:10.1038/nature19057

Levy, S.E., Myers, R.M.: Advancements in next-generation sequencing. Annual review of genomics and human genetics 17, 95–115 (2016), DOI:10.1146/annurev-genom-083115-022413

Manrai, A.K., Funke, B.H., Rehm, H.L., Olesen, M.S., Maron, B.A., Szolovits, P., Margulies, D.M., Loscalzo, J., Kohane, I.S.: Genetic misdiagnoses and the potential for health disparities. New England Journal of Medicine 375(7), 655–665 (2016), DOI:10.1056/NEJMsa1507092

McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R., Thormann, A., Flicek, P., Cunningham, F.: The ensembl variant effect predictor. Genome biology 17(1), 122 (2016), DOI:10.1186/s13059-016-0974-4

Mills, R.E., Luttig, C.T., Larkins, C.E., Beauchamp, A., Tsui, C., Pittard, W.S., Devine, S.E.: An initial map of insertion and deletion (indel) variation in the human genome. Genome research 16(9), 1182–1190 (2006), DOI:10.1101/gr.4565806

Popejoy, A.B., Fullerton, S.M.: Genomics is failing on diversity. Nature 538(7624), 161 (2016), DOI:10.1038/538161a

Rahimzadeh, V., Dyke, S.O., Knoppers, B.M.: An international framework for data sharing: Moving forward with the global alliance for genomics and health. Biopreservation and biobanking 14(3), 256–259 (2016), DOI:10.1089/bio.2016.0005

Reich, D., Thangaraj, K., Patterson, N., Price, A.L., Singh, L.: Reconstructing indian population history. Nature 461(7263), 489–494 (2009), DOI:10.1038/nature08365

Rotimi, C.N., Jorde, L.B.: Ancestry and disease in the age of genomic medicine. New England Journal of Medicine 363(16), 1551–1558 (2010), DOI:10.1056/NEJMra0911564

Schuster, S.C., Miller, W., Ratan, A., Tomsho, L.P., Giardine, B., Kasson, L.R., Harris, R.S., Petersen, D.C., Zhao, F., Qi, J., et al.: Complete khoisan and bantu genomes from southern africa. Nature 463(7283), 943–947 (2010), DOI:10.1038/nature08795

Song, W., Gardner, S.A., Hovhannisyan, H., Natalizio, A., Weymouth, K.S., Chen, W., Thibodeau, I., Bogdanova, E., Letovsky, S., Willis, A., et al.: Exploring the landscape of pathogenic genetic variation in the exac population database: insights of relevance to variant classification. Genetics in Medicine 18(8), 850–854 (2015), DOI:10.1038/gim.2015.180

Tangamornsuksan, W., Chaiyakunapruk, N., Somkrua, R., Lohitnavy, M., Tassaneeyakul, W.: Relationship between the hla-b* 1502 allele and carbamazepine-induced stevens-johnson syndrome and toxic epidermal necrolysis: a systematic review and meta-analysis. JAMA dermatology 149(9), 1025–1032 (2013), DOI:10.1001/jamadermatol.2013.4114




How to Cite

Shah, A. H., Picker, J. D., & Jamuar, S. S. (2017). Using High Performance Computing to Create and Freely Distribute the South Asian Genomic Database, Necessary for Precision Medicine in this Population. Supercomputing Frontiers and Innovations, 4(2), 4–12.