Human Genome. The Saudi. Program. An oasis in the desert of Arab medicine is providing clues to genetic disease. By the Saudi Genome Project Team

Similar documents
Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

Presented by: Tess L. Crisostomo Laboratory Quality Assurance/Compliance Officer Naval Medical Center San Diego

Exome Sequencing Exome sequencing is a technique that is used to examine all of the protein-coding regions of the genome.

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

- OMICS IN PERSONALISED MEDICINE

TRANSFORMING GLOBAL GENETIC DATA INTO MEDICAL DECISIONS

Ribotyping Easily Fills in for Whole Genome Sequencing to Characterize Food-borne Pathogens David Sistanich

Informed Consent for Columbia Combined Genetic Panel (CCGP) for Adults Please read the following

Ion S5 and Ion S5 XL Systems

Genomics and personalised medicine

An innovative approach to genetic testing for improved patient care

High-throughput scale. Desktop simplicity.

Ion S5 and Ion S5 XL Systems

Biomedical Big Data and Precision Medicine

MGI: The power of AI and genetic sequencing

Personalized Human Genome Sequencing

ILLUMINA SEQUENCING SYSTEMS

solid S Y S T E M s e q u e n c i n g See the Difference Discover the Quality Genome

HGMD : Human Gene Mutation Database

Genetic Technologies.notebook March 05, Genetic Technologies

Meet the iseq 100 System.

Introduction to Bioinformatics

Genomic Data Is Going Google. Ask Bigger Biological Questions

REPRODUCTIVE HEALTH. Phosphorus. Diagnostics

The world leader in serving science. DataSafe Solutions. Protect your valuable laboratory data

THE WHITE HOUSE Office of the Vice President

Understanding the science and technology of whole genome sequencing

Improved analysis of genetic testing could lead to more patients with inherited conditions being successfully diagnosed

Information Technology for Genetic and Genomic Based Personalized Medicine. Submitted. April 23, 2008

qcarrier Test INFORMATION DOCUMENT

The 100,000 Genomes Project Genomics Collaboration Event Finnish Residence

Accelerating Precision Medicine with High Performance Computing Clusters

Genomes contain all of the information needed for an organism to grow and survive.

BOARD PAPER - NHS ENGLAND. Purpose of Paper: To inform the Board of the development of an NHS England Personalised Medicine Strategy.

SEQUENCING. M Ataei, PhD. Feb 2016

Scientists don t yet fully

Genomic Information Revolution(s)

Ion S5 and Ion S5 XL Systems

Computational Challenges of Medical Genomics

Clinician s Guide to Actionable Genes and Genome Interpretation

Genomic Research: Issues to Consider. IRB Brown Bag August 28, 2014 Sharon Aufox, MS, LGC

Introduction to Bioinformatics

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes

Whole genome sequencing in drug discovery research: a one fits all solution?

Course Overview: Mutation Detection Using Massively Parallel Sequencing

to precision medicine

Bionano Genomics Reports Financial Results for the Fourth Quarter and Year Ended December 31, 2018

Whole Genome Sequencing in Cancer Diagnostics (research) Nederlandse Pathologiedagen 19 & 20 November 2015

Customer Case Study. Using Big Data Analytics to Create Better Outcomes for Cancer Patients

Total genomic solutions for biobanks. Maximizing the value of your specimens.

UNIT MOLECULAR GENETICS AND BIOTECHNOLOGY

Shivom, the global Genomics-Blockchain Ecosystem The Next Era of Genomics and Healthcare

Lead the way. Molecular Imaging. GE Healthcare. imagination at work

The MiniSeq System. Explore the possibilities. Discover demonstrated NGS workflows for molecular biology applications.

TRANSLATIONAL RESEARCH IN RARE AND NEUROMUSCULAR DISEASES - WHY DATA SHARING MATTERS. H a n n s Lochmüller, Newcastle University

GENETIC TESTING A Resource from the American College of Preventive Medicine

Genes and human health - the science and ethics

Family Secrets Genetic Testing PowerPoint Script

Genetically Sequencing Healthy Babies Yielded Surprising Results

BIOTECH 101 UNDERSTANDING THE BASICS

Pharmacogenetics: A SNPshot of the Future. Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001

Daily Agenda. Make Checklist: Think Time Replication, Transcription, and Translation Quiz Mutation Notes Download Gene Screen for ipad

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

An investigator s experience. Jonathan S. Berg Associate Professor Department of Genetics UNC Chapel Hill

Produced by the Centre for Genetics Education. Internet: 5

Worksheet for Bioinformatics

Whole genome sequencing in the UK Biobank

Molecular Analysis to Enhance Newborn Screening.. March 1, 2016 Michele Caggana, Sc.D., FACMG February 29, 2016

Genetics and Gene Therapy

Custom Panels via Clinical Exomes

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Centre for Genetic Epidemiology & Biostatistics

Applied Bioinformatics

Defining actionable discoveries- Annotating Genomes and Reanalysis. A Laboratory Perspective

2017 HTS-CSRS COMMUNITY PUBLIC WORKSHOP

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

FREQUENTLY ASKED QUESTIONS

Professor Jane Farrar School of Genetics & Microbiology, TCD.

Growing Needs for Practical Molecular Diagnostics: Indonesia s Preparedness for Current Trend

Submission to House of Lords Science and Technology Select Committee Oxford Nanopore Technologies Ltd April 2008

Complex Adaptive Systems Forum: Transformative CAS Initiatives in Biomedicine

6th Form Open Day 15th July 2015

Redefine what s possible with the Axiom Genotyping Solution

Role of Bio-informatics in Molecular Medicine

Punnett Square with Heterozygous Cross (Video clip) There is a glaring error with this video clip. Can you spot it???

Target Enrichment Strategies for Next Generation Sequencing

Visit our Career Flowchart to get more information on some of these career paths.

Human Chromosomes Section 14.1

World Health Organization. The Bioethics, Regulation, and Future of Genomic Modification. Aya Lahlou & Kenza El Bernoussi

STRAND NGS 3.0 ALLOWS USERS TO ANALYZE DATA AT NEARLY TWICE THE SPEED

Marineinspired Oncology

Bioinformatics. Outline of lecture

Testimony of Christopher Newton-Cheh, MD, MPH Volunteer for the American Heart Association

Newborn Screening: It s Complicated! DNA testing in Newborn Screening

Alissa Interpret The next evolution of Cartagenia Bench

BIOL 205. Fall Term (2017)

Human Genomics. Higher Human Biology

Updated software SIFT 4G can further research in human health, the study of biological processes and agricultural products

fastest next-gen workflow 10X more throughput fastest-selling sequencer all in six months Ion Personal Genome Machine Sequencer

Transcription:

By the Saudi Genome Project Team The Saudi image licensed by ingram publishing Human Genome Program An oasis in the desert of Arab medicine is providing clues to genetic disease. Digital Object Identifier 10.1109/MPUL.2015.2476541 Date of publication: 16 November 2015 22 ieee pulse NOVEMBER/DECEMBER 2015 2154-2287/15 2015IEEE

Oil wells, endless deserts, stifling heat, masses of pilgrims, and wealthy-looking urban areas still dominate the widespread mental image of Saudi Arabia. Currently, this image is being extended to include a recent endeavor that is reserving a global share in the limelight as one of the top ten genomics projects currently underway: the Saudi Human Genome Program (SHGP). With sound funding, dedicated resources, and national determination, the SHGP targets the sequencing of 100,000 human genomes over the next five years to conduct world-class genomics-based biomedical research in the Saudi population. Why this project was conceived and thought to be feasible, what is the ultimate target, and how it operates are the questions we answer in this article. Saudi Arabia has a high burden of genetic diseases, mostly due to the high rate of marriage of relatives (around 60% of the marriages). The genetic diseases show up in the form of severe inherited diseases, which manifest early in life, affecting 8% of births in the kingdom, and in the form of common genetic diseases, such as diabetes, that manifest later in life and affect over 20% of the population. These diseases heavily impact quality of life for affected individuals and are a huge burden on the national health care system. It is estimated that the annual cost of these diseases is about US$27 billion. A substantial reduction in children born with genetic disabilities would immediately save over US$270 million, and similar or greater savings may result from a small delay in the age of onset of diabetes or other common disorders. Genetic diseases are caused by mutations in the DNA, specifically in the area including genes. A mutation in the gene accordingly translates to a mutation in the respective protein. If the mutation changes the protein structure and related physiochemical properties, the function of the protein in the cell will be affected. The severity of the disease depends on the importance of the affected protein and its role in the human physiology. According to disease databases, the number of genetic disorders ranges between 7,000 and 8,000; approximately 3,500 of these are still of unknown mutation. The first step toward eliminating the burden of genetic diseases is to find the mutations and respective genes that predispose individuals to the diseases. Then, proper preventative counseling can be planned, or a scheme of rational therapies can be devised on an individual basis, in what may be regarded as personalized medicine. In many cases, the disease-causing genes and gene variants are specific to the Saudi population and are unlikely to be discovered by research conducted outside the region. Hence, the establishment of the SHGP was a must to provide the necessary infrastructure to solve cases and understand disease in the Saudi population. Interestingly, the abundance of genetic diseases in the Saudi population combined with large family sizes makes it easier to identify the gene and mutation underlying a particular disease because one can compare different disease carriers to healthy people, which gives stronger evidence. Furthermore, the studies done as part of the SHGP can be used to verify results obtained by other similar studies that draw conclusions from fewer cases. Thus, the national nature of the project can still benefit the global endeavor in fighting the diseases. The Mission The SHGP mission is to identify the genetic basis of severe and common inherited disease in the Saudi population utilizing state-of-the-art genome sequencing, bioinformatics, and validation techniques. It aims to establish the complete foundation for genomic medicine lab infrastructure, technical capacity, and a genomic knowledge database. The database is planned to be a major output of the project to serve the whole medical NOVEMBER/DECEMBER 2015 ieee pulse 23

FIGURE 1 A KACST research building that houses genome sequencing labs. community, in Saudi Arabia and worldwide. It will help in understanding the genetic bases of diseases and identifying better treatments, which will effectively contribute to the future developments of personalized medicine and genomic sciences. The SHGP will position the kingdom at the forefront of personalized medicine and will empower our citizens to help them make informed decisions for their health plans. It s hoped other global academic institutions will use the impressive facilities the King Abdulaziz City for Science and Technology (KACST) is launching in the near future, said Prince Dr. Turki bin Saud Al Saud, KACST president. The Setup The SHGP is funded and organized by the KACST and involves the creation of a national network of ten genome centers to recruit subjects and undertake the sequencing required (Figure 1). It also involves the establishment of a centralized knowledge base at the KACST to store the resulting information on population variations, including those causing disease, and to make this available to enable future diagnostic and screening efforts. The core technology in the genome centers is what is known as next-generation sequencing (NGS) technology, which is a recent development that enables efficient and cost-effective reading of the DNA sequence that composes the genome of an individual. Advanced computing infrastructure to process big genomic data has also been established to transform the output of the next-generation sequencers into useful knowledge. The SHGP is the largest disease gene discovery project ever undertaken and will therefore also establish the kingdom as a The establishment of the SHGP was a must to provide the necessary infrastructure to solve cases and understand disease in the Saudi population. world leader in disease genetics research and personalized medicine, said Dr. Sultan Al-Sedairy, the project principal investigator and the executive director of the Research Center of the King Faisal Specialist Hospital and Research Center (KFSHRC). The Start and Current Status The SHGP was officially launched in December 2013. In the city of Riyadh, where the KACST headquarters is located, a central genomics and bioinformatics facility is currently running ( Figure 2). Another high-throughput lab is also running in the KFSHRC. This is in addition to three other labs in Jeddah, Medinah, and Riyadh that are ready to run at the time of writing. Furthermore, there are another five satellite genomics labs around the kingdom currently being established. All the project labs are involved in performing NGS to achieve sequencing of the 100,000 genomes. Each is equipped with NGS machines and primary computation power, All the labs follow a standardized procedure for sample collection, banking, processing, and sequencing. The sequencing data are processed through an optimized bioinformatics workflow utilizing a central computer hosted in the KACST. Such workflow guarantees the quality of data generated by satellite labs and provides a link between all research groups, hospitals, clinicians, and scientists involved. The genomic variant data will be fully analyzed and used to create a Saudi-specific database that will provide the basis for future development of personalized medicine in the kingdom, representing the most comprehensive effort to identify disease-causing 24 ieee pulse NOVEMBER/DECEMBER 2015

genes for the population of a country and within the Arab world. Top Medical Genomics Projects Looking at the landscape of large-scale genome projects in medicine, we can observe a shift from internationally oriented projects to national and regional ones. The 1,000 Genomes Project and the Personal Genome Project are two examples of international projects targeting the sequencing of thousands of human genomes. These projects were launched shortly before the widespread availability of low-cost NGS technologies. Projects launched nowadays are mostly of a national or regional nature, targeting more individuals and within population-specific contexts. Examples of such recent projects include the Exome Sequencing Project (2,440 U.S. individuals), the Iceland Genome Project (2,636 Icelandic individuals), the Genomics England 100,000 Genomes Project (100,000 U.K. individuals), the Million Veteran Program (1 million U.S. veterans), and the SHGP (100,000 Saudi individuals). Among these projects, we see that the SHGP has some interesting characteristics. The population is homogeneous as in the Iceland project, it has well-defined medical targets as in the Scottish and Exome Sequencing Projects, and it is of large scale as is the case for the Million Veteran Program and Genomics England program. The recruitment of samples is controlled to target relevant cohorts that help in identifying the genetic causes of the disease. All these points position the SHGP as one of the top international biomedical projects, said Dr. Brian Meyer, chair of the Genetics Department at KFSHRC. SHGP is the largest disease gene discovery project ever undertaken and will therefore also establish the kingdom as a world leader in disease genetics research and personalized medicine. Since the announcement of the sequencing of the human genome, there has been a need to improve the specificity, sensitivity, scalability, speed, and cost-effectiveness of reading DNA. NGS was introduced in 2007 and has since revolutionized the genomic sciences. With the first version of NGS, a single sequencing run could produce a maximum of about 1 GB of data. Four years later, the data output was increased nearly 1,000-fold. NGS enables us to generate a large volume of sequencing data in a matter of days or hours. By comparison, the first human genome sequencing needed ten years to be completed at a cost of about US$3 billion. Today, we can sequence a whole human genome in a few days at a cost of a few thousand U.S. dollars. The simplicity of NGS technology reduced the amount of overhead for running the lab facility and enabled enormous productivity with reasonable The Genomics in SHGP The Use of Revolutionary Technology The study of mutations reveals the causes of many diseases. Many mutation-detection methods rely on the properties of basepair mismatches between a normal and a mutated DNA strand. Restriction enzyme polymorphisms were the first tools used for genetic diagnosis, in combination with Southern blotting of genomic DNA. This technique was first used to detect mutations related to sickle-cell anemia. It was further modified by the use of polymerase chain reaction. These techniques were able to detect the presence of a mutation but unable to read the DNA sequence. Sanger sequencing was then introduced as a new way to read the DNA sequence, and it became the standard way to detect the mutations underlying Mendelian disorders. Early successes from the application of this method included the identification of the mutations responsible for cystic fibrosis and Huntington s disease, among others. The Human Genome Project was completed in 2000 using automated Sanger sequencing. FIGURE 2 SHGP researchers working in the KACST Genome Sequencing Lab. FIGURE 3 The high-performance computer SANAM, one of the top supercomputers worldwide in the green data center in the KACST. NOVEMBER/DECEMBER 2015 ieee pulse 25

team size, said Dr. Dorota Monies, head of the KFSHRC Genome Sequencing Unit. Sequencing Workflows in the SHGP The SHGP offers different sequencing workflows, including whole-genome sequencing (WGS), whole-exome sequencing (WES), and targeted gene sequencing. While WGS covers the whole genome, WES targets only the coding regions (exons of the genes) of the genome. It is estimated that the exome covers 2 3% of the genome. In target sequencing, only selected genes (referred to as a gene panel) will be sequenced. Targeted sequencing is faster and less costly, but there is a chance of missing disease-causing mutations as the approach does not cover the whole set of genes. One of the research objectives of the SHGP is to investigate the use of different gene panels to cover different disease categories, where the gene panel can be selected based on the corresponding phenotype. Therefore, the design and synthesis of 13 custom-made gene panels covering all Online Mendelian Inheritance in Man (OMIM)-documented annotated genes was the approach taken by the SHGP. Recent publications of the project team covering over 5,000 samples show that the use of the gene panels has many advantages compared to the direct use of WES: a low number of false positives compared to WES a high diagnostic rate a low cost per sample up to 50 samples can be multiplexed and sequenced together in one run. These advantages encourage the use of these panels in clinical laboratory settings. These results are very promising, and we believe it will be soon part of routine clinical practice. It will speed up the diagnostic process and reduce the time taken from months to days, said Dr. Nada Altassan, head of the KFSHRC Behavioral Genetics Unit. Big Data and Bioinformatics in SHGP The SHGP, by the scale and nature of its data, is a typical big data project, where the four V s (volume, velocity, variety, and veracity) characterizing big data are present. When running at full capacity, the project will produce 10 15 TB of raw sequence data per day. Therefore, establishing a highperformance and scalable information technology (IT) infrastructure and the use of advanced bioinformatics methods are major components of the SHGP. The structure of the participating centers and the distribution of the genomic data production and analysis form an interesting IT challenge that is The SHGP mission is to identify the genetic basis of severe and common inherited disease in the Saudi population utilizing state-of-the-art genome sequencing, bioinformatics, and validation techniques. probably the first of its kind worldwide, said Dr. Mohamed Abouelhoda, head of the SHGP bioinformatics team. All the labs produce significant amounts of data that should be analyzed and moved to the central storage for large-scale data analysis, with results to be shared among researchers inside and outside the kingdom. While each satellite lab has some computing power to participate in the data analysis, the main computing power for storage and analysis resides in the KACST. The SHGP has also access to the energy-efficient, high-performance computer, SANAM, with a performance of 532 TFlops and high-speed interconnects data rate of 56 Gb/s (Figure 3). SANAM is one of the top supercomputers worldwide, said Dr. Abdulqadir Alaqeeli from the KACST SANAM team. To cope with this distributed IT infrastructure, the SHGP bioinformatics team has developed methods to manage the data and the analysis among the different sites using different computational resources. The transfer of data is prioritized and scheduled to reduce the required bandwidth. The use of commercial cloud computing solutions is also part of the design, to automatically scale the in-house IT resources in response to abrupt computation loads. Collectively, the central and satellite computer resources as well as the automatic extension with commercial cloud solutions work together like a hybrid multicloud system. Next Actions Over its course, the project, with its new discoveries, will find the genetic basis of different genetic diseases. The best practices learned will also help in establishing populationscale diagnostic capabilities to bring research results into the clinic on a wide scale. The project will then pave the way for advanced treatment plans using promising technologies like stem cells and gene/genome editing, where the defective components (genes or cells) can be manipulated either by knocking them out or introducing other nondefective variants that can function well in the living cell. The KACST already has plans to support initiatives in these areas to develop further technologies to leverage the information gained by the SHGP in the near future. More information about the Saudi Human Genome Program is available at http://shgp.kacst.edu.sa. The SHGP team can be contacted through http://www.kacst.edu.sa/en/contact/pages/default.aspx. 26 ieee pulse NOVEMBER/DECEMBER 2015