Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Similar documents
Enhanced Resolution and Statistical Power Through SNP Distributions Within the Short Tandem Repeats

FORENSIC GENETICS. DNA in the cell FORENSIC GENETICS PERSONAL IDENTIFICATION KINSHIP ANALYSIS FORENSIC GENETICS. Sources of biological evidence

Understanding genetic association studies. Peter Kamerman

AFDAA 2012 Winter Meeting Population Statistics Refresher Course Lecture-4: Statistics for Lineage Markers Ranajit Chakraborty, PhD Professor,

Genome-Wide Association Studies (GWAS): Computational Them

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics

SNP Selection. Outline of Tutorial. Why Do We Need tagsnps? Concepts of tagsnps. LD and haplotype definitions. Haplotype blocks and definitions

Developing criteria and data to determine best options for expanding the core CODIS loci

The Whole Genome TagSNP Selection and Transferability Among HapMap Populations. Reedik Magi, Lauris Kaplinski, and Maido Remm

An Introduction to Forensic Genetics

Mathematics of Forensic DNA Identification

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

Familial Breast Cancer

Biology 445K Winter 2007 DNA Fingerprinting

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

The HapMap Project and Haploview

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012

From Forensic Genetics to Genomics Perspectives for an Integrated Approach to the Use of Genetic Evidence in Criminal Investigations

Our motivation for a NGS/MPS SNP panel

B) You can conclude that A 1 is identical by descent. Notice that A2 had to come from the father (and therefore, A1 is maternal in both cases).

Population Genetics II. Bio

What is genetic variation?

Population stratification. Background & PLINK practical

Questions we are addressing. Hardy-Weinberg Theorem

PLINK gplink Haploview

Analysis of genome-wide genotype data

Papers for 11 September

Haplotypes, linkage disequilibrium, and the HapMap

Mutations during meiosis and germ line division lead to genetic variation between individuals

(c) Suppose we add to our analysis another locus with j alleles. How many haplotypes are possible between the two sites?

Why do we need statistics to study genetics and evolution?

Investigating SNPs Flanking the D1S80 Locus in a Tamil Population from India

Algorithms for Genetics: Introduction, and sources of variation

Genome-wide association studies (GWAS) Part 1

A genetic database for DNA-based forensic analysis in Brunei Darussalam

Genetics and Psychiatric Disorders Lecture 1: Introduction

Analyzing Y-STR mixtures and calculating inclusion statistics

An Introduction to Population Genetics

DNA Mixture Interpretation Workshop Karin Crenshaw. Reporting DNA Mixture Results and Statistics

Book chapter appears in:

CMSC423: Bioinformatic Algorithms, Databases and Tools. Some Genetics

Office Hours. We will try to find a time

HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY

Wu et al., Determination of genetic identity in therapeutic chimeric states. We used two approaches for identifying potentially suitable deletion loci

FIU/NIJ Forensic Population Genetics The Marco Polo, August 2010 ADAM DNA EVE

Structure, Measurement & Analysis of Genetic Variation

Empirical Analysis of the STR Profiles Resulting from Conceptual Mixtures

Typing of 67 SNP Loci on X Chromosome by PCR and MALDI-TOF MS

DNA Mixture Interpretation Workshop Dr Chris Maguire. Identification and resolution of DNA mixtures

BST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics

Adapting a likelihood ratio model to enable searching DNA databases with complex STR DNA profiles

Nine polymorphic STR loci in the HLA region in the Shaanxi Han population of China

Population and Statistical Genetics including Hardy-Weinberg Equilibrium (HWE) and Genetic Drift

BST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics

Genetic data concepts and tests

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros

Summary for BIOSTAT/STAT551 Statistical Genetics II: Quantitative Traits

Evaluation of forensic evidence in DNA mixture using RMNE. Creative Commons: Attribution 3.0 Hong Kong License

Park /12. Yudin /19. Li /26. Song /9

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Genome Scanning by Composite Likelihood Prof. Andrew Collins

University of York Department of Biology B. Sc Stage 2 Degree Examinations

Introduction to Mitochondrial DNA Analysis

High-density SNP Genotyping Analysis of Broiler Breeding Lines

Human Genetics 544: Basic Concepts in Population and Statistical Genetics Fall 2016 Syllabus

Allele frequencies in Azuay Population in Ecuador

Association studies (Linkage disequilibrium)

The Real CSI: Using DNA to Identify Criminals and Missing Persons

Constrained Hidden Markov Models for Population-based Haplotyping

Course Announcements

Association Mapping. Mendelian versus Complex Phenotypes. How to Perform an Association Study. Why Association Studies (Can) Work

GeneMarker HID Expert System Human Identification Software with Linked Post-Identification Database Search, Kinship, and DNA Mixture Applications

A Primer of Ecological Genetics

Why can GBS be complicated? Tools for filtering & error correction. Edward Buckler USDA-ARS Cornell University

Crash-course in genomics

JS 190- Population Genetics- Assessing the Strength of the Evidence Pre class activities

Redefine what s possible with the Axiom Genotyping Solution

Book chapter appears in:

"Genetics in geographically structured populations: defining, estimating and interpreting FST."

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

Commonwealth v. Lyons. Cybergenetics How TrueAllele Works (Part 2) Degraded DNA and Allele Dropout. Cybergenetics Webinar November, 2014

PERSPECTIVES. A gene-centric approach to genome-wide association studies

Microsatellite markers

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Supplementary Methods Illumina Genome-Wide Genotyping Single SNP and Microsatellite Genotyping. Supplementary Table 4a Supplementary Table 4b

The evolutionary significance of structure. Detecting and describing structure. Implications for genetic variability

Variation Chapter 9 10/6/2014. Some terms. Variation in phenotype can be due to genes AND environment: Is variation genetic, environmental, or both?

Quantitative Genetics for Using Genetic Diversity

Genome-Wide Association Studies. Ryan Collins, Gerissa Fowler, Sean Gamberg, Josselyn Hudasek & Victoria Mackey

Introduc)on to Sta)s)cal Gene)cs: emphasis on Gene)c Associa)on Studies

Genome wide association studies. How do we know there is genetics involved in the disease susceptibility?

Shrunken Dissimilarity Measure for Genome-wide SNP Data Classification

Summary. Introduction

Genomes contain all of the information needed for an organism to grow and survive.

Genetic Identity. Steve Harris SPASH - Biotechnology

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

Genotyping Technology How to Analyze Your Own Genome Fall 2013

Transcription:

Ranajit Chakraborty, Ph.D. Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Overview Some brief remarks about SNPs Haploblock structure of SNPs in the human genome Criteria for selection of optimal SNP haploblocks for forensic applications Preliminary results of optimal parameter combinations from HapMap Data (Phase I and Phase II) Feasibility of SNP haploblock selection from human genome Strategies of interpretation of SNP haploblock based forensic evidence Preliminary conclusions and future directions Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 2

Single Nucleotide Polymorphism (SNP) C T G T C A C T C G G G T C T G T C T C T C G G G T Most SNPs are biallelic About three million SNPs in human genome (characterized) Provide more results from low quantity template DNA or degraded samples than STR typing Complete automation feasible Low mutation rates (10 8 /site/generation) Use of SNPs in forensics is not new (e.g., HLA DQα) Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 3

How Many SNPs Would be Needed for Forensic Applications? Answer depends upon allele frequencies at SNP sites, efficiency in different types of applications For example, power of discrimination in identity testing; PE or PI in parentage analyses; LR in kinship assessment, etc. Chakraborty, et al. (1999, Electrophoresis 20: 1682 96) showed nomograms suggesting that the number of SNPs needed to equal the power of the current battery of STR loci would necessitate the use of several sets of syntenic SNPs For example, SNPs residing on the same arm of several chromosomes Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 4

Strategies for Improving Power of SNPs for Forensic Applications Translate sets of SNPs into multiallelic markers Select a panel of SNP sets that satisfy conditions of the product rule For example, statistically independent sets of SNPs Search for genome wide availability of desired SNPs for feasibility of detection of such panels of SNPs Test the robustness of typing selected SNPs in forensic samples of compromised DNA quality Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 5

Haplotype Block (Haploblock) Haplotype structure across 500 kb on 5q31 (Daly, M.J., et al. 2001, Nat. Genet. 29: 229 232) Linkage disequilibrium (LD): allelic association between two loci (for example, SNP sites) Closely linked SNPs with high LD haplotype blocks Human genome is composed of block like structures of low haplotype diversity (strong LD within block) separated by recombination hot spots Complete LD among n linked SNPs (n + 1) haplotypes Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 6

Advantages of Haploblock as Forensic Marker Can be typed in highly degraded samples Where no results from STR analysis may be obtained Improves the limited discrimination power of individual SNPs Haploblock can be considered as pseudo STRs One haploblock one STR locus Different haplotypes different alleles Each haplotype treated as a lineage marker like Y chromosome and mtdna Exception possible transmission from both parents following standard Mendelian principles Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 7

HapMap Project (www.hapmap.org) Three major populations (90 Caucasian, 90 African, 45 Chinese and 45 Japanese) Phase II data: > 3,000,000, SNPs LD information: D', r2 Phase information Genotype information Image courtesy of http://hapmap.ncbi.nlm.nih.gov/ Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 8

Haploblock Selection Criteria Exist in all major populations (Caucasian, East Asian, African) Higher discrimination power (for example, lower match probability) than that of the individual SNPs within the block Hardy Weinberg Equilibrium for each block No significant LD between blocks Sufficient number of candidate haploblocks in the whole genome Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 9

Parameters Used in Selection Maximum match probability reduction per haploblock (mmpr) Minimum LD between SNPs: r 2 Population substructure: maximum F st Minimum heterozygosity (MinHet) Minimum number of haplotypes in each population (MinHap) Minimum number of SNPs per haploblock (MinSNP) Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 10

Best Parameter Set mmpr = 0.85 r 2 = 0.7 No haploblock found with r 2 0.8 F st = 0.06 MinHet = 0.2 MinHap = 3 MinSNP = 3 The best thresholds of parameters other than r 2 found on Chr1 Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 11

Haploblocks with Best Parameter Set Chromo -some Num. blocks with PS Num. blocks with PS & HWE Num. blocks with PS & HWE & LD filters (n) Avg. Cum. MP of blocks (b) Cum. Min. MP of SNPs (s) MP reduction per block (mpr) Num. Of SNPs 1 9 9 0 2 23 14 1 0.3287 0.4050 0.8117 6 3 12 10 2 0.1144 0.1617 0.8412 9 4 21 15 1 0.2926 0.3765 0.7773 6 5 16 12 3 0.02633 0.05480 0.7833 25 6 15 10 0 7 16 9 2 0.1035 0.1465 0.8403 30 8 18 12 2 0.1025 0.1518 0.8215 7 9 8 6 0 10 15 8 1 0.3527 0.4169 0.8460 4 11 14 12 3 0.03872 0.06700 0.8209 13 12 12 5 1 0.3036 0.3890 0.7806 5 13 17 14 3 0.0344 0.06409 0.8123 14 14 10 6 3 0.02339 0.04789 0.7876 11 15 9 4 0 16 7 4 1 0.3310 0.4053 0.8167 3 17 5 4 0 18 8 7 1 0.3123 0.3689 0.8465 5 19 5 4 0 20 6 1 0 21 6 3 0 22 1 1 0 Total 253 170 24 1.059E-12 1.566E-10 0.8121 138 Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 12

Haploblock Example Chr2 Haplotype Frequencies Haplotype CEU JPT+CHB YRI 010011 0 0 0.0417 011000 0.0083 0 0.0417 001000 0.3417 0.5167 0.4833 000111 0 0.0056 0 110010 0 0.0056 0 111000 0 0 0.0167 110111 0.5833 0.4611 0.4167 101000 0.0083 0 0 110110 0.05 0.0056 0 110100 0.0083 0.0056 0 Num. SNPs = 6 Num. haplotypes = 10 Avg. Het. = 0.5499 MP of block = 0.3287 Min. MP of SNPs = 0.4050 MP reduction = 0.8117 F st = 0.024 Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 13

Different Haploblock Structure Among Populations r 2 = 0.7 and MinSNP = 3 11,741 haploblocks in Caucasian 12,456 haploblocks in Chinese 12,237 haploblocks in Japanese 7,318 haploblocks in African Population specific haploblock selection criteria may be necessary to obtain best performing systems Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 14

Evidence Interpretation Based on Haploblocks Transfer evidence Mixture interpretation Kinship analysis One genotype (A/T)(A/T) Two possible haplotype combinations TT+AA TA+AT Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 15

Transfer Evidence Compared a single source profile from crime scene evidence with profile of the suspect Exclusion or inclusion compare the genotypes If inclusion, random match probability is: Pr( G) = Haplotype combination ( Hi, Hj) composes G pp Mixture versus single source sample i j Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 16

Transfer Evidence Example Haplotype Frequency Genotype... (A/T)(A/T)... TT 0.4 TA 0.3 AA 0.2 AT 0.1 TT/AA: 2 0.4 0.2 = 0.16 Match Probability = TA/AT: 2 0.3 0.1 = 0.06 = 0.22 Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 17

Mixture Detection Multiple contributors at least four haplotypes The probability of a genotype (G): Pr( G) = Haplotype combination ( Hk,..., Hl) composes G The probability of a genotype (G) given number of contributors (N) Pr( G N = 1) = pipj Pr( G N = 2) = Pr( G N = 3) = Haplotype combination ( Hi, Hj) composes G Haplotype combination ( Hi, Hj, Hk, Hl) composes G p p p p Haplotype combination ( Hi, Hj, Hk, Hl, Hm, Hn) composes G i j k l l i= k p p p p p p p i j k l m n i Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 18

Exclusion Probability and Likelihood Ratio for Mixture Analysis Probability of exclusion (PE) PE = 1 H i p i 2 Likelihood ratio (LR): S is suspect; V is victim; UN is an unknown contributor LR, = where Σ is over all H i s that are contributors to G Pr( V + S) Pr( V + UN) Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 19

Pairwise Kinship Analysis One genotype (G) has k haplotype combinations; X i = (H i1, H i2 ) is i th combination, with likelihood K P(X i ); w i as the weight of X i w P( X )/ P( X ) person 1 X i1 X (k1)1 R person 2 X i2 X (k2)2 = i i i i= 1 Likelihood of these two persons given relationship (R): k k 1 2 = L w w L( X, X R) Block i1 j2 i1 j2 i= 1 j= 1 Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 20

Conclusions This is the first effort to assess the feasibility of genomewide SNP haploblock structures for human identity testing encompassing all major forensic applications SNP haploblocks provide an alternative approach for forensic investigations, especially for highly degraded samples Haploblock selection depends on multiple criteria Consideration is needed for evidence interpretation based on haploblock results, because of multiple haplotype combinations that are possible for observed genotypes Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 21

Future Directions Portability/universality of efficient haploblocks to be tested with wider sets of genome data Alternatively, population group specific panels of haploblocks have to be determined with validation data from anthropologically defined populations Robustness of genotyping in samples with compromised DNA quality (mimicking forensic samples) has to be tested Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 22

Acknowledgements This work was jointly done with Dr. Jianye Ge, Dr. Huifeng Xi, Dr. Bruce Budowle, and Dr. John Plantz, who are co authors of the manuscript to be submitted for publication Research for the work was partially funded by grants and contracts from the US National Institutes of Health and US National Institute of Justice Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 23

Questions? Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 24

Contact Information Ranajit Chakraborty, Ph.D. Professor, Dept. Environmental Health University of Cincinnati College of Medicine 3223 Eden Avenue, Room K 108 Cincinnati, OH 45267 0056 Tel. (513) 558 4925; Fax (513) 558 4397 E mail: ranajit.chakraborty@uc.edu Forensic SNP Analysis Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications 25