Haplotypes Personalized Medicine: Understanding Your Own Genome Fall 2014

Size: px
Start display at page:

Download "Haplotypes Personalized Medicine: Understanding Your Own Genome Fall 2014"

Transcription

1 Haplotypes Personalized Medicine: Understanding Your Own Genome Fall 2014

2 Terminology Review llele: different forms of genecc variacons at a given gene or genecc locus Locus 1 has two alleles, and C, and Locus 2 has two alleles, T and G Individual 1 T G Locus 2 Genotype: specific allelic make- up of an individual s genome Individual 1 has genotype at Locus 1 and genotype TG at Locus 2 Locus 1 Individual 2 T T Heterozygous/Homozygous Locus 1 of Individual 1 is homozygous, and Locus 2 is heterozygous C Locus 1 Locus 2

3 Single Nucleo<de Polymorphism (SNP) GTCTTCGTCTGGT GTCTTCGTCTGGT GTTTTCGTCGGT C p GTTTTCGTCTGGT C m GTCTTCGTCTGT GTTTTCGTCGGT a diploid individual GTTTTCGTCGGT GTCTTCGTCTGT SNP: Binary nucleotide sustitutions at a single locus on a chromosome" each variant is called an "allele "

4 From SNPs to Haplotypes GTTTTCGTCGGT C p GTTTTCGTCTGGT C m GTCTTCGTCTGT GTTTTCGTCGGT a diploid individual GTTTTCGTCGGT GTCTTCGTCTGT chromosome" Haplotype: a stretch of consecutive nucleotides that lie on the same chromosome" What are the alleles here? " GTCTTCGTCTGGT GTCTTCGTCTGGT

5 Haplotypes from SNP rray? C m C p T G C T TGC sequencing Heterozygous diploid individual TC TG Genotype g pairs of alleles with association of alleles to chromosomes unknown T G C T T T C G haplotype h (h 1, h 2 ) possile associations of alleles to chromosome

6 Why Haplotypes? Haplotypes have a greater power for discriminacng genomic regions Consider J inary markers (e.g., SNPs) in a genomic region There are 2 J possile haplotypes SNPs have only two alleles, whereas haplotypes have a larger numer of alleles Good genecc marker for populacon, evolucon and hereditary diseases

7 Haplotypes and SNPs GTCTTCGTCTGGT GTCTTCGTCTGGT GTTTTCGTCGGT GTTTTCGTCTGGT GTCTTCGTCTGT GTTTTCGTCGGT GTTTTCGTCGGT GTCTTCGTCTGT chromosome" Haplotype CTG 3/8 TG 3/8 CT 2/8 SNPs can discnguish etween two groups of individuals (a group with C, another group with T) Haplotypes can discnguish etween three groups of individuals (each group with CTG, TG, and CT)

8 Haplotypes and SNPs GTCTTCGTCTGGT GTCTTCGTCTGGT GTTTTCGTCGGT GTTTTCGTCTGGT GTCTTCGTCTGT GTTTTCGTCGGT GTTTTCGTCGGT GTCTTCGTCTGT chromosome" Haplotype CTG 3/8 TG 3/8 CT 2/8 healthy healthy disease X Haplotypes can have a greater power to detect disease- related genome region

9 Inferring Haplotypes from SNP rray Data Genotype: C////TG Maternal genotype: C////TT Paternal genotype: CC////TG Then the haplotype is C/TG. Genotype: C////TG Maternal genotype: C////TG Paternal genotype: C////TG Cannot determine unique haplotype Prolem: How can we determine haplotypes without parental genotypes

10 Phasing: Inferring Haplotypes from SNP Data Given mulclocus genotypes at a set of SNPs for many individuals, phasing means Reconstruct haplotypes for all individuals EsCmate frequencies of all possile haplotypes Haplotype reconstruccon algorithm Clark s parsimony algorithm (Clark, Mol. Biol. Evol. 1990)

11 Iden<fiaility Genotypes of 14 individual Genotype representations 0/0! 0 1/1! 1 0/1!

12 Iden<fiaility " 10" 7" Parsimonious solution" " 1" 1" 1" 8" 1" 1" 6" 1"

13 Haplotype Reconstruc<on lgorithm y Clark (1990) Choose individuals that are homozygous at every locus (e.g. TT////CC) Haplotype: TC Choose individuals that are heterozygous at just one locus (e.g. TT//// CG) Haplotypes: TC or TG Tally the resulcng known haplotypes. For each known haplotype, look at all remaining unresolved cases: is there a cominacon to make this haplotype? Known haplotype: TC Unresolved pa`ern: T////CG Inferred haplotype: TC/G. dd to list. Known haplotype: TC and TG Unresolved pa`ern: T////CG Inferred haplotypes: TC and TG. dd oth to list. ConCnue uncl all haplotypes have een recovered or no new haplotypes can e found this way.

14 Prolems: Clark (1990) Many unresolved haplotypes at the end Ignores recominacon Error in haplotype inference if a crossover of two actual haplotypes is idenccal to another true haplotype Frequency of such errors depends on recominacon rate Clark (1990): algorithm "performs well" even with small sample sizes.

15 RECOMBINTION & LINKGE DISEQUILIBRIUM

16 Morgan s FruiYly Experiment Morgan s frui.ly data (1909): 2,839 flies Eye color : red a: purple Wing length B: normal : vescgial BB x aa" ab x aa" ab a aab aa" Exp " Os 1, ,195"

17 Linked Genes When two genes lies on the same chromosome, they are transmi`ed to offspring in a non independent manner B a

18 Morgan s Explana<on: Recomina<on B B a a F1: a B a a F2:" B a a a a a B a Recomination has taken place"

19 Recomina<on Parental types: ab, aa Recominants: a, aab The proporcon of recominants etween the two genes (or characters) is called the recomina*on frac*on etween these two genes.

20 Review: Correla<on GP and TV in hours per week are negacvely correlated Mean How can we quancfy the level of correlacon?

21 Covariance and Correla<on Degree of associacon etween two variales x and y Given oservacons x 1,, x n and y 1,, y n Covariance CorrelaCon coefficient: (Variance of x i s) x (n- 1) (Variance of y i s) x (n- 1) Falls etween - 1 and +1, with sign indicacng direccon of associacon

22 Correla<on etween X 1 and X 2 X1 X 2

23 Basic Concepts B a B a High LD -> No Recomination (r 2 = 1) SNP1 tags SNP2 B B B a a a Low LD -> Recomination Many possiilities a B a B B a B etc B B X OR Parent 1 Parent 2

24 Linkage Disequilirium (LD) LD reflects the relaconship etween alleles at different loci. Omen, r 2 (squared correlacon coefficient) is used as a measure of LD. Locus Locus B

25 How to Compute r 2 on SNP Data Individuals SNP1 SNP2 SNP r 2 =1.0 SNP1 SNP2 SNP3 r 2 matrix SNP1 SNP2 SNP r 2 =0.0 R 2 =0.0

26 Linkage Disequilirium in SNP Data r 2 in SNP data from a populacon of individuals (Black: r 2 =1, white: r 2 =0) genome PopulaCon 2 PopulaCon 2 genome PopulaCon 1 PopulaCon 1

27 Reducing Genotyping Costs with Tag SNPs Neary SNPs in the genome are in linkage disequilirium (LD), and thus contain redundant informacon. If we knew which SNPs are in LD, we can pre- select the representacve SNPs for each LD lock of chromosome, and genotype only for those SNPs. r 2 values (lack: r 2 =1, white: r 2 =0) Genome These two SNPs are in high LD and thus are redundant

28 Reducing Genotyping Costs with Tag SNPs Two- stage data colleccon process Stage 1: Collect genotype data for a dense set of SNPs for mulcple individuals Select a non- redundant set of tag SNPs y examining the LD pa`ern Stage 2: Collect genotype data only for the tagsnps for a large numer of individuals

29 lgorithm for Selec<ng Tag SNPs Greedy algorithm Genome Randomly select a tag SNP Iterate uncl the set of candidate tag SNPs is empty Genome Find the SNPs with a high LD with the previously selected tag SNP (r 2 >0.8) and remove those SNPs from the set of candidate tag SNPs

30 Recomina<on and Haplotypes Rememer Clark s method does not take into account recominacon How can we find haplotypes from SNP data collected for a populacon of individuals under recominacon? ssume haplotypes of ancestor chromosomes and treat modern individuals chromosomes as a mosaic of ancestor chromosomes However, ancestor chromosomes cannot e oserved! Key idea: Haplotype of each individual is a mosaic of other individuals haplotypes unresolved haplotypes are similar to known haplotypes

31 Recomina<on and Haplotypes h 1, h 2, h 3 : unoserved ancestral haplotypes we have no SNP data h 4, h 4B : unoserved haplotypes for modern individuals Haplotypes are unoserved, however, we have SNP data Circles: mutacons TCGTTTTCGTTCGTGTGTTTCTGTTCTGTGTCGTTC TCGTTTTTTCTTTTGCGTGTTTCTGCTGCTTCTGTGTCGTTC Mosaic of ancestor chromosomes

32 PHSE Model as an HMM Inferring the unoserved state laels for each of the oserved SNP amounts to haplotype reconstruccon TCGTTTTCGTTCGTGTGTTTCTGTTCTGTGTCGTTC h3h3h3h3h3h3h3h3h3h3h3h3h2h2h2h2h2h2h2 TCGTTTTTTCTTTTGCGTGTTTCTGCTGCTTCTGTGTCGTTC h3h3h3h3h3h1h1h1h1h2h2h2h2h2h2h2h2h2h3h3h3.

33 PHSE Model as an HMM States: h 1, h 2, h 3, unoserved ancestral haplotypes State space with possile transicons h 1 h 2 h 3 TransiCon proailices (from SNP X l to X l+1 ) are dependent on distance etween adjacent SNPs d l RecominaCon rate etween adjacent SNPs ρ l Emission proailices: mutacon model Task: infer hidden state laels for each locus of each individual (h 4, h 4B )

34 INTERNTIONL HPMP PROJECT (HPMP.ORG)

35 HapMap Phase 3 Samples lael population sample # samples QC+ Draft 1 SW* frican ancestry in Southwest US CEU* Utah residents with Northern and Western European ancestry from the CEPH collection CHB Han Chinese in Beijing, China CHD Chinese in Metropolitan Denver, Colorado GIH Gujarati Indians in Houston, Texas JPT Japanese in Tokyo, Japan LWK Luhya in Weuye, Kenya MEX* Mexican ancestry in Los ngeles, California MKK* Maasai in Kinyawa, Kenya TSI Toscans in Italy YRI* Yorua in Iadan, Nigeria ,301 1,115 * Population is made of family trios

36 Haplotype Structure and Recomina<on Rate Es<mates: HapMap I vs. HapMap II

37 HapMap: llele Frequencies in Different Popula<ons Comparison of allele frequencies for individuals from pairs of populacons The red regions show that there are many SNPs that have similar low frequencies in each pair of analysis panels/ populacons. CHB (Chinese) and JPT (Japanese) have similar allele frequencies

38 Why Haplotypes? Haplotypes have a greater power for discriminacng genomic regions Consider J inary markers (e.g., SNPs) in a genomic region There are 2 J possile haplotypes ut in fact, far fewer are seen in human popula<on SNPs have only two alleles, whereas haplotypes have a larger numer of alleles Good genecc marker for populacon, evolucon and hereditary diseases

39 Summary Haplotype: a set of genecc markers that lie on the same chromosome How can we find haplotypes from SNPs? RecominaCon, linkage disequilirium, and how to take advantage of them Haplotypes as a set of linked SNPs with a greater discriminacve power Tag SNPs for saving the genotyping cost HapMap Project

Genotyping Technology How to Analyze Your Own Genome Fall 2013

Genotyping Technology How to Analyze Your Own Genome Fall 2013 Genotyping Technology 02-223 How to nalyze Your Own Genome Fall 2013 HapMap Project Phase 1 Phase 2 Phase 3 Samples & POP panels Genotyping centers Unique QC+ SNPs 269 samples (4 populations) HapMap International

More information

Haplotypes, linkage disequilibrium, and the HapMap

Haplotypes, linkage disequilibrium, and the HapMap Haplotypes, linkage disequilibrium, and the HapMap Jeffrey Barrett Boulder, 2009 LD & HapMap Boulder, 2009 1 / 29 Outline 1 Haplotypes 2 Linkage disequilibrium 3 HapMap 4 Tag SNPs LD & HapMap Boulder,

More information

Popula'on Gene'cs I: Gene'c Polymorphisms, Haplotype Inference, Recombina'on Computa.onal Genomics Seyoung Kim

Popula'on Gene'cs I: Gene'c Polymorphisms, Haplotype Inference, Recombina'on Computa.onal Genomics Seyoung Kim Popula'on Gene'cs I: Gene'c Polymorphisms, Haplotype Inference, Recombina'on 02-710 Computa.onal Genomics Seyoung Kim Overview Two fundamental forces that shape genome sequences Recombina.on Muta.on, gene.c

More information

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics S G ection ON tatistical enetics Design and Analysis of Genetic Association Studies Hemant K Tiwari, Ph.D. Professor & Head Section on Statistical Genetics Department of Biostatistics School of Public

More information

Resources at HapMap.Org

Resources at HapMap.Org Resources at HapMap.Org HapMap Phase II Dataset Release #21a, January 2007 (NCBI build 35) 3.8 M genotyped SNPs => 1 SNP/700 bp # polymorphic SNPs/kb in consensus dataset International HapMap Consortium

More information

Genotype Prediction with SVMs

Genotype Prediction with SVMs Genotype Prediction with SVMs Nicholas Johnson December 12, 2008 1 Summary A tuned SVM appears competitive with the FastPhase HMM (Stephens and Scheet, 2006), which is the current state of the art in genotype

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL SUPPLEMENTAL MATERIAL Supplementary Table 1: RT-qPCR primer sequences. Sequences are shown from 5 to 3 direction; all primers are designed using mouse genome as reference. 36B4-F; TGAAGCAAAGGAAGAGTCGGAGGA

More information

The Whole Genome TagSNP Selection and Transferability Among HapMap Populations. Reedik Magi, Lauris Kaplinski, and Maido Remm

The Whole Genome TagSNP Selection and Transferability Among HapMap Populations. Reedik Magi, Lauris Kaplinski, and Maido Remm The Whole Genome TagSNP Selection and Transferability Among HapMap Populations Reedik Magi, Lauris Kaplinski, and Maido Remm Pacific Symposium on Biocomputing 11:535-543(2006) THE WHOLE GENOME TAGSNP SELECTION

More information

I/O Suite, VCF (1000 Genome) and HapMap

I/O Suite, VCF (1000 Genome) and HapMap I/O Suite, VCF (1000 Genome) and HapMap Hin-Tak Leung April 13, 2013 Contents 1 Introduction 1 1.1 Ethnic Composition of 1000G vs HapMap........................ 2 2 1000 Genome vs HapMap YRI (Africans)

More information

GENOME-WIDE data sets from worldwide panels of

GENOME-WIDE data sets from worldwide panels of Copyright Ó 2010 by the Genetics Society of America DOI: 10.1534/genetics.110.116681 Population Structure With Localized Haplotype Clusters Sharon R. Browning*,1 and Bruce S. Weir *Department of Statistics,

More information

Supplementary Figure 1 a

Supplementary Figure 1 a Supplementary Figure 1 a b GWAS second stage log 10 observed P 0 2 4 6 8 10 12 0 1 2 3 4 log 10 expected P rs3077 (P hetero =0.84) GWAS second stage (BBJ, Japan) First replication (BBJ, Japan) Second replication

More information

Analysis of genome-wide genotype data

Analysis of genome-wide genotype data Analysis of genome-wide genotype data Acknowledgement: Several slides based on a lecture course given by Jonathan Marchini & Chris Spencer, Cape Town 2007 Introduction & definitions - Allele: A version

More information

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011 EPIB 668 Genetic association studies Aurélie LABBE - Winter 2011 1 / 71 OUTLINE Linkage vs association Linkage disequilibrium Case control studies Family-based association 2 / 71 RECAP ON GENETIC VARIANTS

More information

Crash-course in genomics

Crash-course in genomics Crash-course in genomics Molecular biology : How does the genome code for function? Genetics: How is the genome passed on from parent to child? Genetic variation: How does the genome change when it is

More information

Population description. 103 CHB Han Chinese in Beijing, China East Asian EAS. 104 JPT Japanese in Tokyo, Japan East Asian EAS

Population description. 103 CHB Han Chinese in Beijing, China East Asian EAS. 104 JPT Japanese in Tokyo, Japan East Asian EAS 1 Supplementary Table 1 Description of the 1000 Genomes Project Phase 3 representing 2504 individuals from 26 different global populations that are assigned to five super-populations Number of individuals

More information

Algorithms for Genetics: Introduction, and sources of variation

Algorithms for Genetics: Introduction, and sources of variation Algorithms for Genetics: Introduction, and sources of variation Scribe: David Dean Instructor: Vineet Bafna 1 Terms Genotype: the genetic makeup of an individual. For example, we may refer to an individual

More information

Office Hours. We will try to find a time

Office Hours.   We will try to find a time Office Hours We will try to find a time If you haven t done so yet, please mark times when you are available at: https://tinyurl.com/666-office-hours Thanks! Hardy Weinberg Equilibrium Biostatistics 666

More information

Human Populations: History and Structure

Human Populations: History and Structure Human Populations: History and Structure In the paper Novembre J, Johnson, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann A, Nelson MB, Stephens M, Bustamante CD. 2008. Genes mirror geography

More information

Human Population Differentiation Is Strongly Correlated with Local Recombination Rate

Human Population Differentiation Is Strongly Correlated with Local Recombination Rate Human Population Differentiation Is Strongly Correlated with Local Recombination Rate Alon Keinan 1,2,3 *, David Reich 1,2 1 Department of Genetics, Harvard Medical School, Boston, Massachusetts, United

More information

Human Population Differentiation is Strongly Correlated With Local Recombination Rate

Human Population Differentiation is Strongly Correlated With Local Recombination Rate Human Population Differentiation is Strongly Correlated With Local Recombination Rate The Harvard community has made this article openly available. Please share how this access benefits you. Your story

More information

News. The International HapMap Project

News. The International HapMap Project HapMap News A Publication of the Coriell Institute for Medical Research, V olume 1, 2004 The International HapMap Project Excitement is building as scientists begin to construct a resource called the haplotype

More information

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics Genetic Variation and Genome- Wide Association Studies Keyan Salari, MD/PhD Candidate Department of Genetics How many of you did the readings before class? A. Yes, of course! B. Started, but didn t get

More information

Population Genetics II. Bio

Population Genetics II. Bio Population Genetics II. Bio5488-2016 Don Conrad dconrad@genetics.wustl.edu Agenda Population Genetic Inference Mutation Selection Recombination The Coalescent Process ACTT T G C G ACGT ACGT ACTT ACTT AGTT

More information

De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse

De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse SUPPLEMENTARY INFORMATION De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations Wong et al. The Supplementary Information contains 4 Supplementary Figures, 3

More information

What is genetic variation?

What is genetic variation? enetic Variation Applied Computational enomics, Lecture 05 https://github.com/quinlan-lab/applied-computational-genomics Aaron Quinlan Departments of Human enetics and Biomedical Informatics USTAR Center

More information

SNP Selection. Outline of Tutorial. Why Do We Need tagsnps? Concepts of tagsnps. LD and haplotype definitions. Haplotype blocks and definitions

SNP Selection. Outline of Tutorial. Why Do We Need tagsnps? Concepts of tagsnps. LD and haplotype definitions. Haplotype blocks and definitions SNP Selection Outline of Tutorial Concepts of tagsnps University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for Human Genetics

More information

The HapMap Project and Haploview

The HapMap Project and Haploview The HapMap Project and Haploview David Evans Ben Neale University of Oxford Wellcome Trust Centre for Human Genetics Human Haplotype Map General Idea: Characterize the distribution of Linkage Disequilibrium

More information

Understanding genetic association studies. Peter Kamerman

Understanding genetic association studies. Peter Kamerman Understanding genetic association studies Peter Kamerman Outline CONCEPTS UNDERLYING GENETIC ASSOCIATION STUDIES Genetic concepts: - Underlying principals - Genetic variants - Linkage disequilibrium -

More information

CUMACH - A Fast GPU-based Genotype Imputation Tool. Agatha Hu

CUMACH - A Fast GPU-based Genotype Imputation Tool. Agatha Hu CUMACH - A Fast GPU-based Genotype Imputation Tool Agatha Hu ahu@nvidia.com Term explanation Figure resource: http://en.wikipedia.org/wiki/genotype Allele: one of two or more forms of a gene or a genetic

More information

Alkes Price Harvard School of Public Health January 24 & January 26, 2017

Alkes Price Harvard School of Public Health January 24 & January 26, 2017 EPI 511, Advanced Population and Medical Genetics Week 1: Intro + HapMap / 1000 Genomes Linkage Disequilibrium Alkes Price Harvard School of Public Health January 24 & January 26, 2017 EPI 511: Course

More information

Genome-wide association studies (GWAS) Part 1

Genome-wide association studies (GWAS) Part 1 Genome-wide association studies (GWAS) Part 1 Matti Pirinen FIMM, University of Helsinki 03.12.2013, Kumpula Campus FIMM - Institiute for Molecular Medicine Finland www.fimm.fi Published Genome-Wide Associations

More information

IL1B-CGTC haplotype is associated with colorectal cancer in. admixed individuals with increased African ancestry

IL1B-CGTC haplotype is associated with colorectal cancer in. admixed individuals with increased African ancestry IL1B-CGTC haplotype is associated with colorectal cancer in admixed individuals with increased African ancestry María Carolina Sanabria-Salas 1, 2,*, Gustavo Hernández-Suárez 1, Adriana Umaña- Pérez 2,

More information

Haplotype phasing in large cohorts: Modeling, search, or both?

Haplotype phasing in large cohorts: Modeling, search, or both? Haplotype phasing in large cohorts: Modeling, search, or both? Po-Ru Loh Harvard T.H. Chan School of Public Health Department of Epidemiology Broad MIA Seminar, 3/9/16 Overview Background: Haplotype phasing

More information

Genome variation - part 1

Genome variation - part 1 Genome variation - part 1 Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 2 Friday 21 th January 2016 Aims of the session Introduce major

More information

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573 Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573 Mark J. Rieder Department of Genome Sciences mrieder@u.washington washington.edu Epidemiology Studies Cohort Outcome Model to fit/explain

More information

Evaluation of a multipoint method for imputing genotypes using HapMap III

Evaluation of a multipoint method for imputing genotypes using HapMap III Mathematical Statistics Stockholm University Evaluation of a multipoint method for imputing genotypes using HapMap III Emil Rehnberg Examensarbete 2009:5 Postal address: Mathematical Statistics Dept. of

More information

HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY

HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY Third Pavia International Summer School for Indo-European Linguistics, 7-12 September 2015 HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY Brigitte Pakendorf, Dynamique du Langage, CNRS & Université

More information

Computational Haplotype Analysis: An overview of computational methods in genetic variation study

Computational Haplotype Analysis: An overview of computational methods in genetic variation study Computational Haplotype Analysis: An overview of computational methods in genetic variation study Phil Hyoun Lee Advisor: Dr. Hagit Shatkay A depth paper submitted to the School of Computing conforming

More information

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 Topics Genetic variation Population structure Linkage disequilibrium Natural disease variants Genome Wide Association Studies Gene

More information

Genome-wide association study identifies a susceptibility locus for HCVinduced hepatocellular carcinoma. Supplementary Information

Genome-wide association study identifies a susceptibility locus for HCVinduced hepatocellular carcinoma. Supplementary Information Genome-wide association study identifies a susceptibility locus for HCVinduced hepatocellular carcinoma Vinod Kumar 1,2, Naoya Kato 3, Yuji Urabe 1, Atsushi Takahashi 2, Ryosuke Muroyama 3, Naoya Hosono

More information

Computational Workflows for Genome-Wide Association Study: I

Computational Workflows for Genome-Wide Association Study: I Computational Workflows for Genome-Wide Association Study: I Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 16, 2014 Outline 1 Outline 2 3 Monogenic Mendelian Diseases

More information

Overview. Methods for gene mapping and haplotype analysis. Haplotypes. Outline. acatactacataacatacaatagat. aaatactacctaacctacaagagat

Overview. Methods for gene mapping and haplotype analysis. Haplotypes. Outline. acatactacataacatacaatagat. aaatactacctaacctacaagagat Overview Methods for gene mapping and haplotype analysis Prof. Hannu Toivonen hannu.toivonen@cs.helsinki.fi Discovery and utilization of patterns in the human genome Shared patterns family relationships,

More information

Linkage Analysis Computa.onal Genomics Seyoung Kim

Linkage Analysis Computa.onal Genomics Seyoung Kim Linkage Analysis 02-710 Computa.onal Genomics Seyoung Kim Genome Polymorphisms Gene.c Varia.on Phenotypic Varia.on A Human Genealogy TCGAGGTATTAAC The ancestral chromosome SNPs and Human Genealogy A->G

More information

Genotype quality control with plinkqc Hannah Meyer

Genotype quality control with plinkqc Hannah Meyer Genotype quality control with plinkqc Hannah Meyer 219-3-1 Contents Introduction 1 Per-individual quality control....................................... 2 Per-marker quality control.........................................

More information

Phasing of 2-SNP Genotypes based on Non-Random Mating Model

Phasing of 2-SNP Genotypes based on Non-Random Mating Model Phasing of 2-SNP Genotypes based on Non-Random Mating Model Dumitru Brinza and Alexander Zelikovsky Department of Computer Science, Georgia State University, Atlanta, GA 30303 {dima,alexz}@cs.gsu.edu Abstract.

More information

Petar Pajic 1 *, Yen Lung Lin 1 *, Duo Xu 1, Omer Gokcumen 1 Department of Biological Sciences, University at Buffalo, Buffalo, NY.

Petar Pajic 1 *, Yen Lung Lin 1 *, Duo Xu 1, Omer Gokcumen 1 Department of Biological Sciences, University at Buffalo, Buffalo, NY. The psoriasis associated deletion of late cornified envelope genes LCE3B and LCE3C has been maintained under balancing selection since Human Denisovan divergence Petar Pajic 1 *, Yen Lung Lin 1 *, Duo

More information

Polymorphisms in Population

Polymorphisms in Population Computational Biology Lecture #5: Haplotypes Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Oct 17 2005 L4-1 Polymorphisms in Population Why do we care about variations? Underlie

More information

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1 Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1 Human single nucleotide polymorphisms The majority of human sequence variation is due to substitutions that have occurred once in the

More information

Statistical Tools for Predicting Ancestry from Genetic Data

Statistical Tools for Predicting Ancestry from Genetic Data Statistical Tools for Predicting Ancestry from Genetic Data Timothy Thornton Department of Biostatistics University of Washington March 1, 2015 1 / 33 Basic Genetic Terminology A gene is the most fundamental

More information

Browsing Genes and Genomes with Ensembl

Browsing Genes and Genomes with Ensembl Browsing Genes and Genomes with Ensembl Victoria Newman Ensembl Outreach Officer EMBL-EBI Objectives What is Ensembl? What type of data can you get in Ensembl? How to navigate the Ensembl browser website.

More information

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score Midterm 1 Results 10 Midterm 1 Akey/ Fields Median - 69 8 Number of Students 6 4 2 0 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 Exam Score Quick review of where we left off Parental type: the

More information

Nature Genetics: doi: /ng.3143

Nature Genetics: doi: /ng.3143 Supplementary Figure 1 Quantile-quantile plot of the association P values obtained in the discovery sample collection. The two clear outlying SNPs indicated for follow-up assessment are rs6841458 and rs7765379.

More information

B) You can conclude that A 1 is identical by descent. Notice that A2 had to come from the father (and therefore, A1 is maternal in both cases).

B) You can conclude that A 1 is identical by descent. Notice that A2 had to come from the father (and therefore, A1 is maternal in both cases). Homework questions. Please provide your answers on a separate sheet. Examine the following pedigree. A 1,2 B 1,2 A 1,3 B 1,3 A 1,2 B 1,2 A 1,2 B 1,3 1. (1 point) The A 1 alleles in the two brothers are

More information

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR Human Genetic Variation Ricardo Lebrón rlebron@ugr.es Dpto. Genética UGR What is Genetic Variation? Origins of Genetic Variation Genetic Variation is the difference in DNA sequences between individuals.

More information

CMSC423: Bioinformatic Algorithms, Databases and Tools. Some Genetics

CMSC423: Bioinformatic Algorithms, Databases and Tools. Some Genetics CMSC423: Bioinformatic Algorithms, Databases and Tools Some Genetics CMSC423 Fall 2009 2 Chapter 13 Reading assignment CMSC423 Fall 2009 3 Gene association studies Goal: identify genes/markers associated

More information

The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome

The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome Of course, every person on the planet with the exception of identical twins has a unique

More information

Supplementary Note: Detecting population structure in rare variant data

Supplementary Note: Detecting population structure in rare variant data Supplementary Note: Detecting population structure in rare variant data Inferring ancestry from genetic data is a common problem in both population and medical genetic studies, and many methods exist to

More information

Efficient Association Study Design Via Power-Optimized Tag SNP Selection

Efficient Association Study Design Via Power-Optimized Tag SNP Selection doi: 10.1111/j.1469-1809.2008.00469.x Efficient Association Study Design Via Power-Optimized Tag SNP Selection B. Han 1,H.M.Kang 1,M.S.Seo 2, N. Zaitlen 3 and E. Eskin 4, 1 Department of Computer Science

More information

Genome-Wide Association Studies (GWAS): Computational Them

Genome-Wide Association Studies (GWAS): Computational Them Genome-Wide Association Studies (GWAS): Computational Themes and Caveats October 14, 2014 Many issues in Genomewide Association Studies We show that even for the simplest analysis, there is little consensus

More information

Structure, Measurement & Analysis of Genetic Variation

Structure, Measurement & Analysis of Genetic Variation Structure, Measurement & Analysis of Genetic Variation Sven Cichon, PhD Professor of Medical Genetics, Director, Division of Medcial Genetics, University of Basel Institute of Neuroscience and Medicine

More information

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill Introduction to Add Health GWAS Data Part I Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill Outline Introduction to genome-wide association studies (GWAS) Research

More information

Basics in Genetics Analysis

Basics in Genetics Analysis Genetics an Diseases Basics in Genetics nalysis Heping Zhang Environment 9/24/2007 Dr. Doug Brutlag Lecture Syllaus central paraigm //www.s-star.org/ 2 Diseases Progression How oes the Breast Cancer grows

More information

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012 Lecture 23: Causes and Consequences of Linkage Disequilibrium November 16, 2012 Last Time Signatures of selection based on synonymous and nonsynonymous substitutions Multiple loci and independent segregation

More information

Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia

Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia Kevin Galinsky Harvard T. H. Chan School of Public Health American Society

More information

On the Power to Detect SNP/Phenotype Association in Candidate Quantitative Trait Loci Genomic Regions: A Simulation Study

On the Power to Detect SNP/Phenotype Association in Candidate Quantitative Trait Loci Genomic Regions: A Simulation Study On the Power to Detect SNP/Phenotype Association in Candidate Quantitative Trait Loci Genomic Regions: A Simulation Study J.M. Comeron, M. Kreitman, F.M. De La Vega Pacific Symposium on Biocomputing 8:478-489(23)

More information

Summary. Introduction

Summary. Introduction doi: 10.1111/j.1469-1809.2006.00305.x Variation of Estimates of SNP and Haplotype Diversity and Linkage Disequilibrium in Samples from the Same Population Due to Experimental and Evolutionary Sample Size

More information

Chapter 7. Linkage and Chromosome Mapping

Chapter 7. Linkage and Chromosome Mapping Chapter 7. Linkage and Chromosome Mapping Outline of Linkage, Recombination, and the Mapping of Genes on Chromosomes Linkage and Meiotic Recombination Genes linked together on the same chromosome usually

More information

Observing Patterns in Inherited Traits. Chapter 11

Observing Patterns in Inherited Traits. Chapter 11 Observing Patterns in Inherited Traits Chapter 11 Impacts, Issues: The Color of Skin Like most human traits, skin color has a genetic basis; more than 100 gene products affect the synthesis and deposition

More information

Analysing Alu inserts detected from high-throughput sequencing data

Analysing Alu inserts detected from high-throughput sequencing data Analysing Alu inserts detected from high-throughput sequencing data Harun Mustafa Mentor: Matei David Supervisor: Michael Brudno July 3, 2013 Before we begin... Even though I'll only present the minimal

More information

Population stratification. Background & PLINK practical

Population stratification. Background & PLINK practical Population stratification Background & PLINK practical Variation between, within populations Any two humans differ ~0.1% of their genome (1 in ~1000bp) ~8% of this variation is accounted for by the major

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu January 29, 2015 Why you re here

More information

Statistical Methods for Quantitative Trait Loci (QTL) Mapping

Statistical Methods for Quantitative Trait Loci (QTL) Mapping Statistical Methods for Quantitative Trait Loci (QTL) Mapping Lectures 4 Oct 10, 011 CSE 57 Computational Biology, Fall 011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 1:00-1:0 Johnson

More information

ARTICLE Population-Genetic Properties of Differentiated Human Copy-Number Polymorphisms

ARTICLE Population-Genetic Properties of Differentiated Human Copy-Number Polymorphisms ARTICLE Population-Genetic Properties of Differentiated Human Copy-Number Polymorphisms Catarina D. Campbell, 1 Nick Sampas, 2 Anya Tsalenko, 2 Peter H. Sudmant, 1 Jeffrey M. Kidd, 1,3 Maika Malig, 1 Tiffany

More information

PUBH 8445: Lecture 1. Saonli Basu, Ph.D. Division of Biostatistics School of Public Health University of Minnesota

PUBH 8445: Lecture 1. Saonli Basu, Ph.D. Division of Biostatistics School of Public Health University of Minnesota PUBH 8445: Lecture 1 Saonli Basu, Ph.D. Division of Biostatistics School of Public Health University of Minnesota saonli@umn.edu Statistical Genetics It can broadly be classified into three sub categories:

More information

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed.

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed. MCB 104 MIDTERM #2 October 23, 2013 ***IMPORTANT REMINDERS*** Print your name and ID# on every page of the exam. You will lose 0.5 point/page if you forget to do this. Name KEY If you need more space than

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Contents De novo assembly... 2 Assembly statistics for all 150 individuals... 2 HHV6b integration... 2 Comparison of assemblers... 4 Variant calling and genotyping... 4 Protein truncating variants (PTV)...

More information

LINKAGE AND CHROMOSOME MAPPING IN EUKARYOTES

LINKAGE AND CHROMOSOME MAPPING IN EUKARYOTES LINKAGE AND CHROMOSOME MAPPING IN EUKARYOTES Objectives: Upon completion of this lab, the students should be able to: Understand the different stages of meiosis. Describe the events during each phase of

More information

Efficient Genomewide Selection of PCA-Correlated tsnps for Genotype Imputation

Efficient Genomewide Selection of PCA-Correlated tsnps for Genotype Imputation Efficient Genomewide Selection of PCA-Correlated tsnps for Genotype Imputation Asif Javed 1,2, Petros Drineas 2, Michael W. Mahoney 3 and Peristera Paschou 4 1 Computational Biology Center, IBM T. J. Watson

More information

2014 Pearson Education, Inc. Mapping Gene Linkage

2014 Pearson Education, Inc. Mapping Gene Linkage Mapping Gene Linkage Dihybrid Cross - a cross showing two traits e.g pea shape and pea color The farther apart the genes are to one another the more likely a break between them happens and there will

More information

Update on the Genomics Data in the Health and Re4rement Study. Sharon Kardia Jennifer A. Smith University of Michigan April 2013

Update on the Genomics Data in the Health and Re4rement Study. Sharon Kardia Jennifer A. Smith University of Michigan April 2013 Update on the Genomics Data in the Health and Re4rement Study Sharon Kardia Jennifer A. Smith University of Michigan April 2013 Genetic variation in SNPs (Single Nucleotide Polymorphisms) ATTGCAATCCGTGG...ATCGAGCCA.TACGATTGCACGCCG

More information

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes Coalescence Scribe: Alex Wells 2/18/16 Whenever you observe two sequences that are similar, there is actually a single individual

More information

Linkage & Crossing over

Linkage & Crossing over Linkage & Crossing over Linkage Hereditary units or genes which determine the characters of an individual are carried in the chromosomes and an individual usually has many genes for the determination of

More information

Genetic dissection of complex traits, crop improvement through markerassisted selection, and genomic selection

Genetic dissection of complex traits, crop improvement through markerassisted selection, and genomic selection Genetic dissection of complex traits, crop improvement through markerassisted selection, and genomic selection Awais Khan Adaptation and Abiotic Stress Genetics, Potato and sweetpotato International Potato

More information

Content Objectives Write these down!

Content Objectives Write these down! Content Objectives Write these down! I will be able to identify: Key terms associated with Mendelian Genetics The patterns of heredity explained by Mendel The law of segregation The relationship between

More information

IntroducCon to Experimental Design of Sequencing Based Studies. Michael C. Zody Workshop on Genomics Cesky Krumlov January 15, 2014

IntroducCon to Experimental Design of Sequencing Based Studies. Michael C. Zody Workshop on Genomics Cesky Krumlov January 15, 2014 IntroducCon to Experimental Design of Sequencing Based Studies Michael C. Zody Workshop on Genomics Cesky Krumlov January 15, 2014 LogisCcs IntroducCon Please feel free to ask quescons at any point Slides

More information

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs (3) QTL and GWAS methods By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs Under what conditions particular methods are suitable

More information

Chapter 14: Mendel and the Gene Idea

Chapter 14: Mendel and the Gene Idea Chapter 4: Mendel and the Gene Idea. The Experiments of Gregor Mendel 2. Beyond Mendelian Genetics 3. Human Genetics . The Experiments of Gregor Mendel Chapter Reading pp. 268-276 TECHNIQUE Parental generation

More information

Genome Scanning by Composite Likelihood Prof. Andrew Collins

Genome Scanning by Composite Likelihood Prof. Andrew Collins Andrew Collins and Newton Morton University of Southampton Frequency by effect Frequency Effect 2 Classes of causal alleles Allelic Usual Penetrance Linkage Association class frequency analysis Maj or

More information

Our motivation for a NGS/MPS SNP panel

Our motivation for a NGS/MPS SNP panel Increasing the power in paternity and relationship testing utilizing MPS for the analysis of a large SNP panel Ida Grandell 1, Andreas Tillmar 1,2 1 Department of Forensic Genetics and Forensic Toxicology,

More information

QTL Mapping, MAS, and Genomic Selection

QTL Mapping, MAS, and Genomic Selection QTL Mapping, MAS, and Genomic Selection Dr. Ben Hayes Department of Primary Industries Victoria, Australia A short-course organized by Animal Breeding & Genetics Department of Animal Science Iowa State

More information

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications Ranajit Chakraborty, Ph.D. Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications Overview Some brief remarks about SNPs Haploblock structure of SNPs in the human genome Criteria

More information

Supplementary Materials

Supplementary Materials Supplementary Materials Genome-wide association study identifies 1p36.22 as a new susceptibility locus for hepatocellular carcinoma in chronic hepatitis B virus carriers Hongxing Zhang 1, Yun Zhai 1, Zhibin

More information

Why do we need statistics to study genetics and evolution?

Why do we need statistics to study genetics and evolution? Why do we need statistics to study genetics and evolution? 1. Mapping traits to the genome [Linkage maps (incl. QTLs), LOD] 2. Quantifying genetic basis of complex traits [Concordance, heritability] 3.

More information

SAMPLE MIDTERM QUESTIONS (Prof. Schoen s lectures) Use the information below to answer the next two questions:

SAMPLE MIDTERM QUESTIONS (Prof. Schoen s lectures) Use the information below to answer the next two questions: SAMPLE MIDTERM QUESTIONS (Prof. Schoen s lectures) Use the information below to answer the next two questions: Assume that high blood pressure is inherited as an autosomal dominant trait. You genotype

More information

Linkage Disequilibrium. Biostatistics 666

Linkage Disequilibrium. Biostatistics 666 Linkage Disequilibrium iostatistics 666 Logistics: Office Hours Office hours on Mondays at 4 m. Room 4614 School of Public Health Tower Previously asic roerties of a locus llele Frequencies Genotye Frequencies

More information

Ch. 14 Reminder: Unlinked Genes & Independent Assortment. 1. Cross: F1 dihybrid test cross: DO the Punnett Square

Ch. 14 Reminder: Unlinked Genes & Independent Assortment. 1. Cross: F1 dihybrid test cross: DO the Punnett Square Ch. 14 Reminder: Unlinked Genes & Independent Assortment 1. Cross: F1 dihybrid test cross: DO the Punnett Square b + b vg + vg (gray body, normal wings) with bb vgvg (black body vestigial wings) 2. Results

More information

Association studies (Linkage disequilibrium)

Association studies (Linkage disequilibrium) Positional cloning: statistical approaches to gene mapping, i.e. locating genes on the genome Linkage analysis Association studies (Linkage disequilibrium) Linkage analysis Uses a genetic marker map (a

More information

Course Announcements

Course Announcements Statistical Methods for Quantitative Trait Loci (QTL) Mapping II Lectures 5 Oct 2, 2 SE 527 omputational Biology, Fall 2 Instructor Su-In Lee T hristopher Miles Monday & Wednesday 2-2 Johnson Hall (JHN)

More information

Dan Geiger. Many slides were prepared by Ma ayan Fishelson, some are due to Nir Friedman, and some are mine. I have slightly edited many slides.

Dan Geiger. Many slides were prepared by Ma ayan Fishelson, some are due to Nir Friedman, and some are mine. I have slightly edited many slides. Dan Geiger Many slides were prepared by Ma ayan Fishelson, some are due to Nir Friedman, and some are mine. I have slightly edited many slides. Genetic Linkage Analysis A statistical method that is used

More information