Gene Environment Interaction Analysis. Methods in Bioinformatics and Computational Biology. edited by. Sumiko Anno

Similar documents
Bio-Inspired Wettability Surfaces

Data Mining and Applications in Genomics

DNA Microarray Technology and Data Analysis in Cancer Research Downloaded from

Nanocomposite, Ceramic, and Thin Film Scintillators

BTRY 7210: Topics in Quantitative Genomics and Genetics

Computational Workflows for Genome-Wide Association Study: I

DESIGN OF PILE FOUNDATIONS IN LIQUEFIABLE SOILS

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Life at the. Nanoscale

The Entropy Crisis Downloaded from by on 04/25/18. For personal use only.

Microarrays in Diagnostics and Biomarker Development

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Age-Adjusted Death Rates for Coronary Heart Disease, U.S.,

in Biomedicine A Gentle Introduction to Support Vector Machines Volume 1: Theory and Methods

Introduction to Bioinformatics

Péter Antal Ádám Arany Bence Bolgár András Gézsi Gergely Hajós Gábor Hullám Péter Marx András Millinghoffer László Poppe Péter Sárközy BIOINFORMATICS

Construction Technology for Tall Buildings

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential

10. BIOTECHNOLOGY (Code No. 045)

Genome-wide association studies (GWAS) Part 1

The Lexicon The rare simple: Monogenic Mendelian HTN The common complicated: Polygenic multi-factorial HTN

Advanced Plant Technology Program Vocabulary

SNPs - GWAS - eqtls. Sebastian Schmeier

AN INTROOUCTION TO THE BASICS

China s Energy. Industrial Map of. Industrial Map of China's Energy Downloaded from

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

ACCEPTED. Victoria J. Wright Corresponding author.

UK Biobank Axiom Array

Validation Study of FUJIFILM QuickGene System for Affymetrix GeneChip

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

QTL ANALYSIS OF EMAMECTIN BENZOATE SUSCEPTIBILITY IN THE SALMON LOUSE Lepeophtheirus salmonis

PRIORITIZING SNPS FOR DISEASE-GENE ASSOCIATION STUDIES: ALGORITHMS AND SYSTEMS

The GMO Handbook. Genetically Modified Animals, Microbes, and Plants in Biotechnology. Edited by. Sarad R. Parekh, PhD

Genetics: Analysis of Genes and Genomes, Ninth Edition Includes Navigate 2 Advantage Access

SNP calling and VCF format

Introduction to Bioinformatics and Gene Expression Technology

Quiz will begin at 10:00 am. Please Sign In

ICH Topic E16 Genomic Biomarkers Related to Drug Response: Context, Structure and Format of Qualification Submissions. Step 3

KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies

Engineering Genetic Circuits

An introduction to genetics and molecular biology

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical.

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Lecture 2: Biology Basics Continued

Emerging Focus. Nutrigenomics. Department of Nutritional Sciences Faculty of Life Sciences

Strength- Based Leadership Coaching in Organizations

University of Groningen. Genomic Wake-Up Call Samol, Marta

Answers to additional linkage problems.

Genome-Wide Association Studies (GWAS): Computational Them

Principal Component Analysis in Genomic Data

Exploring the Genetic Basis of Congenital Heart Defects

Sept 2. Structure and Organization of Genomes. Today: Genetic and Physical Mapping. Sept 9. Forward and Reverse Genetics. Genetic and Physical Mapping

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros

Linkage Disequilibrium. Adele Crane & Angela Taravella

MAS refers to the use of DNA markers that are tightly-linked to target loci as a substitute for or to assist phenotypic screening.

Park /12. Yudin /19. Li /26. Song /9

MAPPING GENES TO TRAITS IN DOGS USING SNPs

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

Mayday and beyond: New Approaches for the Visualisation of Genomics and Transcriptomics Data

Interactions Between SNP Alleles at Multiple Loci and Variation in Skin Pigmentation in 122 Caucasians

Chapter 15 Gene Technologies and Human Applications

The first and only fully-integrated microarray instrument for hands-free array processing

Nutrigenomics and nutrigenetics are they the keys for healthy nutrition?

mrna for protein translation

4.1. Genetics as a Tool in Anthropology

NANOTECHNOLOGY IN HEALTH CARE

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

GENOME ANALYSIS AND BIOINFORMATICS

Examination Assignments

Bioinformatics A Practical Handbook of Next Generation Sequencing and Its Applications

Themes. Homo erectus. Jin and Su, Nature Reviews Genetics (2000)

Data Sources and Biobanks in the Asia-Pacific Region. Wei Zhou, MD, Ph.D. Department of Epidemiology, Merck Research Laboratories October 23, 2014

EDV as part of the horizontal protection of Plant Breeders Rights

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

WELDING INSPECTION TECHNOLOGY

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Conifer Translational Genomics Network Coordinated Agricultural Project

Genome Sequence Assembly

MSc specialization Animal Breeding and Genetics at Wageningen University

After the association: Functional and Biological Validation of Variants

Computational Haplotype Analysis: An overview of computational methods in genetic variation study

DRINKING WATER SUPPLY AND AGRICULTURAL POLLUTION

Genomics and Bioinformatics GMS6231 (3 credits)

GENE MAPPING. Genetica per Scienze Naturali a.a prof S. Presciuttini

Applications and Uses. (adapted from Roche RealTime PCR Application Manual)

Genomic Clinical Trials and Predictive Medicine

Beyond single genes or proteins

PERSPECTIVES. A gene-centric approach to genome-wide association studies

Conifer Translational Genomics Network Coordinated Agricultural Project

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Chp 10 Patterns of Inheritance

I See Dead People: Gene Mapping Via Ancestral Inference

An introductory overview of the current state of statistical genetics

5/18/2017. Genotypic, phenotypic or allelic frequencies each sum to 1. Changes in allele frequencies determine gene pool composition over generations

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer:

Genomic resources and gene/qtl discovery in cereals

Axiom mydesign Custom Array design guide for human genotyping applications

Transcription:

Gene Environment Interaction Analysis Methods in Bioinformatics and Computational Biology edited by Sumiko Anno

Gene Environment Interaction Analysis

Gene Environment Interaction Analysis Methods in Bioinformatics and Computational Biology edited by Sumiko Anno

Published by Pan Stanford Publishing Pte. Ltd. Penthouse Level, Suntec Tower 3 8 Temasek Boulevard Singapore 038988 Email: editorial@panstanford.com Web: www.panstanford.com British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. Gene Environment Interaction Analysis: Methods in Bioinformatics and Computational Biology Copyright 2016 by Pan Stanford Publishing Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the publisher. For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher. ISBN 978-981-4669-63-4 (Hardcover) ISBN 978-981-4669-64-1 (ebook) Printed in the USA

Contents Preface xiii 1. Understanding Skin Color Variations as an Adaptation by Detecting Gene Environment Interactions 1 Sumiko Anno, Kazuhiko Ohshima, and Takashi Abe 1.1 Introduction 2 1.1.1 Human Skin Color as an Environmental Adaptation 3 1.1.2 Mechanism of Melanin Formation 3 1.1.3 Skin Color Diversity due to DNA Polymorphism 4 1.1.4 Objectives 4 1.2 A Linkage Disequilibrium Based Statistical Approach to Detect Interactions between SNP Alleles at Multiple Loci That Contribute to Skin Pigmentation Variation between Human Populations 5 1.2.1 SNP Analysis of European and East Asian Cohorts 5 1.2.2 Genotype and Allele Frequencies for the 20 SNPs in the European and East Asian Population Groups 6 1.2.3 Cluster Analysis 8 1.2.4 Linkage Disequilibrium Generated by Gene Gene Interactions Contributes to Differences between Racial Groups 11 1.3 SNP Analyses Reveal Natural Selection of Pigmentation Candidate Genes from Haplotypes 13 1.3.1 Background 13 1.3.2 Detecting Natural Selection in the Human Genome on the Basis of the Haplotype Structure 14

vi Contents 1.3.3 Discussion 15 1.3.4 Future Investigations of Natural Selection and Environmental Adaptability Regarding Skin Pigmentation 16 1.4 Elucidation of the UVR-Induced Selective Genetic Mechanisms Influencing Variations in Human Skin Pigmentation 17 1.4.1 Influence of UVR Levels on Skin Color Variation 17 1.4.2 Statistical Methods of Clarifying Gene Environment Interactions, Viewing Skin Pigmentation as a Complex Trait 18 1.4.3 New Methods for Detecting Gene Environment Interactions for Human Skin Pigmentation Variation 19 1.4.4 RS Data Processing and Spatial Analysis in GIS 20 1.4.5 Gene Environment Interaction Analysis 21 1.4.6 Results 23 1.4.7 Discussion 27 1.5 Conclusions 29 2. Information Theoretic Methods for Gene Environment Interaction Analysis 39 Jonathan Knights and Murali Ramanathan 2.1 Introduction 39 2.2 Information Theoretic Metrics and Searching for GEIs 41 2.2.1 Entropy and Mutual Information 41 2.2.1.1 Entropy 41 2.2.1.2 Mutual information 42 2.2.1.3 The k-way interaction information 43 2.2.1.4 Total correlation information 45 2.2.1.5 Phenotype-associated information 45

Contents vii 2.3 How and Why Do the KWII and the PAI Measure Statistical Interaction? 46 2.4 Performance of KWII and PAI Search Algorithms on Simulated Data 47 2.4.1 KWII as an Interaction Metric 50 2.4.1.1 PAI removes the effects of LD on TCI 51 2.5 Algorithms 53 2.5.1 Noninformation Theoretic Methods 54 2.5.2 Information Theoretic Algorithms 56 2.6 Applications of Information Theory to Interaction Analysis 59 2.7 Critiques of Information Theory Approaches to Interaction Analysis 62 2.8 Conclusions 64 3. Approaches for Gene Environment Interaction Analysis: Practice of Regional Epidemiological Study 73 Mio Nakazato and Takahiro Maeda 3.1 Introduction 74 3.2 Community Investigation and Setting of the Research Objective 74 3.2.1 Study Background and Purpose: Homocysteine and Arteriosclerosis 77 3.3 Study Design and Methods 79 3.3.1 Survey Items 79 3.3.2 Sample Size 80 3.3.3 Sample Collection and Storage 80 3.3.3.1 Explanation of the community research survey to each organization and request for cooperation 80 3.3.3.2 Methods and flow of the research survey 81 3.3.3.3 Collection and storage of blood samples 82 3.3.4 Measurement and Analysis of Samples 83 3.3.5 Data Management 85 3.3.6 Ethics Committee 86

viii Contents 3.3.7 Preparation of the Field of Regional Epidemiological Study 87 3.3.8 Study Limitations 88 3.4 Statistical Analysis 89 3.4.1 Preparation of Analytical Sheets 89 3.4.2 Close Investigation of Data 89 3.4.3 Selection of Statistical Analysis Methods 94 3.4.4 Practice of Analysis 95 3.5 Later Surveys 106 3.5.1 Homocysteine and Folic Acid 106 3.5.2 Adiponectin Polymorphism and Arteriosclerosis 108 3.5.3 Leukocyte Count and Arteriosclerosis 109 3.5.4 Current Study Purpose in Our Regional Epidemiological Research Field 110 3.5.5 Other Studies 110 3.6 Conclusion 111 4. Use of Bioinformatics in Revealing the Identity of Nature s Products with Minimum Genetic Variation: The Sibling Species 121 K. Gajapathy, A. Ramanan, S. L. Goodacre, R. Ramasamy, and S. N. Surendran 4.1 Cryptic Species: An Introduction 122 4.2 Computer Power in Taxonomy of Sibling Species 124 4.2.1 DNA and Protein as Tools in Taxonomy: Alternatives or Auxiliary? 124 4.2.2 Protein-Based Assays 129 4.2.3 Karyotyping and Chromosome Banding Pattern 129 4.3 Automated Species Identification 130 4.4 Species Complex among Insect Vectors in Sri Lanka: Two Case Studies Where Bioinformatic Approaches Have Solved Taxonomic Problems 134 4.5 Conclusion 141

Contents ix 5. Integrated Bioinformatics, Biostatistics, and Molecular Epidemiologic Approaches to Study How the Environment and Genes Work Together to Influence the Development of Complex Chronic Diseases 151 Alok Deoraj, Changwon Yoo, and Deodutta Roy 5.1 Introduction 152 5.2 Tools for Identifying Genetic, Environmental, and Stochastic Factors Relevant in Complex Human Diseases 156 5.2.1 Candidate Gene 157 5.2.2 Copy Number Variations 157 5.2.3 Single-Nucleotide Polymorphism 158 5.2.4 Haplotype Mapping 159 5.2.5 Genome-Wide Association Screen 160 5.2.6 Epigenetic Changes and Variation in Noncoding DNA Elements 161 5.2.7 Variations, Mutations, and Damages in the Mitochondrial Genome 163 5.2.8 Chronic Disease Bionetwork through Interactome, Phenome, and Systems Approaches 163 5.2.8.1 Transcriptomics 164 5.2.8.2 Proteomics 165 5.2.8.3 Interactomics 165 5.2.8.4 Metabolomics 166 5.2.8.5 Phenomics 166 5.3 Integration of Functional Genomic, Epigenetic, and Environmental Data of Molecular Epidemiological Studies Using Bioinformatics and Biostatistics Approaches 167 5.3.1 Molecular Epidemiologic Study Designs for G G and GEI 168 5.3.1.1 Family-Based Studies 168 5.3.1.2 Studies of Unrelated Individuals 168 5.3.1.3 Retrospective Design 169 5.3.1.4 Prospective Design 170 5.3.1.5 Case-Only Design 171

x Contents 5.3.2 Identification of Mutations, SNPs, Changes in CNVs, and Variations in a Significant Set of Over- and Underexpressed Genes 172 5.3.3 Enrichment Analysis of Significant Genes and Proteins 175 5.3.4 Analysis of Combined Effects of Modified Gene Expressions, CNVs, Mutations, and Molecular Interactions Influencing Biological Pathways and Network(s) That Contribute to the Susceptibility to, Resistance to, or Development of Complex Diseases 175 5.3.4.1 Multifactor Dimensionality Reduction 176 5.3.4.2 Bayesian Networks 176 5.3.5 Validation of Key Causal Genes/ Proteins/Molecules/Environmental and Stochastic Factors Predicted to Be Involved in the Development of Disease by Statistical and Other Approaches 179 5.3.5.1 Validation by Statistical Methods 179 5.3.5.2 Literature-Based Validation of Key Causal Genes/ Proteins/Molecules/ Environmental and Stochastic Factors Involved in the Etiology of Chronic Diseases Using Models Generated by Empirical Data 180 5.4 Predictive Analysis of Lifetime Risk of Developing Disease 182 5.5 Technical Challenges 183 5.5.1 Sample Size 183 5.5.2 Complex Mixture of Covariates 184 5.5.3 Coordination in Data Collection and Their Meta-Analysis 184

Contents xi 5.5.4 Lack of Computational Power 185 5.6 Conclusion 186 5.7 Online Resources 187 Index 193

Preface Gene environment (G E) interactions contribute to the development of complex diseases and phenotypic variation. They are a hot topic in human genetics, and analyses of G E interactions are expected to have many potential applications. Despite the importance of G E interactions in the etiology of complex diseases and phenotypic variation, insufficient attention has been paid to developing models for detecting these interactions. This textbook introduces different models of G E interactions for use in the determination of human disease and phenotypic variation. Applying models of the complex interactions between genes and environment will lead to novel methods of disease detection and prevention, as well as new interventions in various fields. I would like to thank the publisher for bringing me along on this adventure and for doing an excellent job. I am particularly grateful to Stanford Chong, Sarabjeet Garcha, and Shivani Sharma. Sumiko Anno February 2016