Forensics and DNA Sta1s1cs. Harry R Erwin, PhD CIS308 Faculty of Applied Sciences University of Sunderland

Similar documents
JS 190- Population Genetics- Assessing the Strength of the Evidence Pre class activities

Hardy-Weinberg Principle

DNA Mixture Interpretation Workshop Dr Chris Maguire. Identification and resolution of DNA mixtures

DNA Mixture Interpretation Workshop Karin Crenshaw. Reporting DNA Mixture Results and Statistics

HARDY-WEINBERG EQUILIBRIUM

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Book chapter appears in:

STR Interpretation Guidelines

FORENSIC GENETICS. DNA in the cell FORENSIC GENETICS PERSONAL IDENTIFICATION KINSHIP ANALYSIS FORENSIC GENETICS. Sources of biological evidence

Genotype AA Aa aa Total N ind We assume that the order of alleles in Aa does not play a role. The genotypic frequencies follow as

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Procedure for Casework DNA Interpretation

7-1. Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium)

Principles of Population Genetics

Biol Lecture Notes

Concepts: What are RFLPs and how do they act like genetic marker loci?

Laboratory Validation. Chapter 16

Evaluation of forensic evidence in DNA mixture using RMNE. Creative Commons: Attribution 3.0 Hong Kong License

Introduc)on to NGS Variant Calling

The Real CSI: Using DNA to Identify Criminals and Missing Persons

Population Genetics (Learning Objectives)

Methods Available for the Analysis of Data from Dominant Molecular Markers

Population Genetics and Evolution

Standard for Forensic DNA Interpretation and Comparison Protocols

POPULATION GENETICS. Evolution Lectures 4

Decoding Chromatin States with Epigenome Data Advanced Topics in Computa8onal Genomics

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference

Methods for the analysis of nuclear DNA data

Genetics Lecture 16 Forensics

Genome-Wide Association Studies (GWAS): Computational Them

AP BIOLOGY Population Genetics and Evolution Lab

"Genetics in geographically structured populations: defining, estimating and interpreting FST."

LECTURE 5: LINKAGE AND GENETIC MAPPING

Kernel Management Guidelines

Introduction to population genetics. CRITFC Genetics Training December 13-14, 2016

Linkage & Genetic Mapping in Eukaryotes. Ch. 6

POPULATION GENETICS: The study of the rules governing the maintenance and transmission of genetic variation in natural populations.

GPE engineering project management. Project Management in an Engineering Context

Good morning. I am Eduardo López, the sponsor of this project. I am director of regional opera>ons for our company Movistar, which is the cellphone

The Making of the Fittest: Natural Selection in Humans

Conducting Market Research

Chapter 23: The Evolution of Populations. 1. Populations & Gene Pools. Populations & Gene Pools 12/2/ Populations and Gene Pools

6702 Topics in Computa2onal Sustainability. Spring 2011

HR Training. Interviewing Guidelines

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.

Gene Linkage and Genetic. Mapping. Key Concepts. Key Terms. Concepts in Action

AP Biology Laboratory 8 Population Genetics Virtual Student Guide

The Making of the Fittest: Natural Selection in Humans

Principles of Information Systems

1000 Genomes project: from mapping reads to de novo muta6ons

LAB ACTIVITY ONE POPULATION GENETICS AND EVOLUTION 2017

Human linkage analysis. fundamental concepts

Lecture WS Evolutionary Genetics Part I - Jochen B. W. Wolf 1

Lab 8: Population Genetics and Evolution. This may leave a bad taste in your mouth

Graph structures for represen/ng and analysing gene/c varia/on. Gil McVean

John M. Butler and Peter M. Vallone National Institute of Standards and Technology

Disentangling the Influence of Temperature and Antecedent Soil Moisture on Colorado River Water Resources

HARDY WEIBERG EQUILIBRIUM & BIOMETRY

An Analysis of the Human Resource Prac5ces in the SME Sector of Pakistan: Case Study of Bahawalpur Region

General aspects of genome-wide association studies

Essential Elements of a Defense-Review of DNA Testing Results

Breakout Session 2. Ques0on 5

Exam 1 Answers Biology 210 Sept. 20, 2006

EVOLUTION/HERDEDITY UNIT Unit 1 Part 8A Chapter 23 Activity Lab #11 A POPULATION GENETICS AND EVOLUTION

Sequence variation in the short tandem repeat system SE33 discovered by next generation sequencing

Chapter 8. An Introduction to Population Genetics

MEASURES OF GENETIC DIVERSITY

Evaluating Forensic DNA Evidence

Genetic Equilibrium: Human Diversity Student Version

EXPLORING THE IMPACT OF NEW MEDICAL TECHNOLOGY ON WORKFORCE PLANNING*

DNA MIXTURES AND RE- INTERPRETATION O C C R I M E L A B O R A T O R Y, D N A S E C T I O N E L I Z A B E T H T H O M P S O N A P R I L 3,

! Allele Interactions

Laboratory Exercise 4. Multiplex PCR of Short Tandem Repeats and Vertical Polyacrylamide Gel Electrophoresis.

Mutations during meiosis and germ line division lead to genetic variation between individuals

Conifer Translational Genomics Network Coordinated Agricultural Project

wheat yield (tonnes ha 1 ) year Key: total yield contribution to yield made by selective breeding Fig. 4.1

The Polymerase Chain Reaction. Chapter 6: Background

Genetic Drift Lecture outline. 1. Founder effect 2. Genetic drift consequences 3. Population bottlenecks 4. Effective Population size

POPULATION GENETICS AND EVOLUTION

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

BTRY 7210: Topics in Quantitative Genomics and Genetics

We can use a Punnett Square to determine how the gametes will recombine in the next, or F2 generation.

Hawaii Hazards Awareness & Resilience Program. Contents. Module 5: Risk Assessment 3/1/17. Vulnerability and Capacity Assessment (VCA)

Project Controls Expo - 31 st Oct 2012

A/A;b/b x a/a;b/b. The doubly heterozygous F1 progeny generally show a single phenotype, determined by the dominant alleles of the two genes.

Sawtooth Software. Sample Size Issues for Conjoint Analysis Studies RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

LAB. POPULATION GENETICS. 1. Explain what is meant by a population being in Hardy-Weinberg equilibrium.

Procedure for Casework Report Writing

Topic 11. Genetics. I. Patterns of Inheritance: One Trait Considered

Biology Meets Engineering: Communica5on Between Fields. Mark A. Stremler Associate Professor Engineering Science and Mechanics

DNA Mixture Interpretation Workshop Michael D. Coble, PhD. Current SWGDAM Guidelines

Analyzing Y-STR mixtures and calculating inclusion statistics

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed.

Driving a Food Safety Culture. June 8, 2017 Presented by: David Acheson

11.1 Genetic Variation Within Population. KEY CONCEPT A population shares a common gene pool.

Experimental Design: Part I. Overview 1/31/11. MAR 6648: Marke=ng Research February 3, 2010

Our goal....to understanding (wisdom)...to knowledge...to information data

Biased Tests of Association: Comparisons of Allele Frequencies when Departing from Hardy-Weinberg Proportions

Transcription:

Forensics and DNA Sta1s1cs Harry R Erwin, PhD CIS308 Faculty of Applied Sciences University of Sunderland

References Goodwin, Linacre, and Hadi (2007) An Introduc+on to Forensic Gene+cs, Wiley. Butler (2005) Forensic DNA Typing, 2 nd edi1on, Elsevier.

Sta1s1cs and DNA According to Butler, Sta1s1cal gene1c informa1on is oren more difficult for DNA analysts to grasp than the technology and biology issues because of its heavy use of mathema1cs par1cularly algebra. The concepts of probabili1es can be challenging to forensic scien1sts schooled in biology rather than mathema1cs. The implica1on is that you may need to provide the necessary exper1se. 8(

Lecture Plan Review STR popula1on database analyses Profile frequency es1mates, likelihood ra1os, and source a\ribu1on Approaches to sta1s1cal analysis of mixtures and degraded DNA Kinship and parentage tes1ng

Review: What to Remember Probability Laws of probability Likelihood ra1os Bayesian sta1s1cs Sta1s1cs Hypothesis tes1ng Chi square test Confidence intervals Randomiza1on tests

Introduc1on There are three possible outcomes of a DNA test: 1. No match 2. Inconclusive 3. Match Only a match requires sta1s1cs to provide meaning. Which sta1s1cs to apply is debatable.

Laws of probability Probability: number of 1mes an event occurs divided by the number of opportuni1es for it to occur. Three laws of probability to remember 1. Probabili1es range between 0.0 and 1.0. 2. If two events are mutually exclusive the probability of either taking place is the sum of their probabili1es. 3. If two events are independent the probability of both occurring is the product of their individual probabili1es.

Likelihood ra1os A Likelihood Ra+o (LR) is the comparison of the probabili1es of the evidence under two alterna1ve (mutually exclusive) hypotheses. The Null Hypothesis, and The Alterna1ve Hypothesis. These hypotheses should cover all cases LR = Pr(H p )/Pr(H d )

Bayesian sta1s1cs Posterior odds = (Likelihood ra1o)*(prior odds) Pr(H p E)/Pr(H d E) = LR*Pr(E H p )/Pr(E H d ) Verbal terminology for likelihood ra1os Likelihood Ra,o Verbal Equivalent 1 10 Limited support for the prosecu1on hypothesis 10 100 Moderate support for the prosecu1on hypothesis 100 1000 Moderately strong support for the prosecu1on hypothesis 1000 10000 Strong support for the prosecu1on hypothesis 10000 100000 Very strong support for the prosecu1on hypothesis

Fallacies to avoid Prosecutor s fallacy Defendant s fallacy

Sta1s1cs Sta1s1cs measures uncertainty and reliability. A popula+on is the set of objects of interest. A sample is an observable subset of a popula1on. A sta+s+c is some observable property of a sample.

Hypothesis tes1ng Choose two alterna1ve hypotheses, H 0 and H 1 Select appropriate sta1s1cal model Specify the level of significance and its cri1cal value, C Collect data and calculate sta1s1c Check region of rejec1on for sta1s1c Yes No Accept H 0 Reject? Accept H 1

Chi square test A goodness to fit test. Answers How close do the observa1ons come to the expected results? The Χ 2 sta1s1c is parameterised by degrees of freedom, df, and large values indicate there s a significant devia1on from theory.

Confidence intervals Usually the sample mean plus and minus two standard devia1ons. An observa1on outside that interval is 95% unlikely. Other confidence intervals can be defined. These are used to help visualise measurements against a popula1on.

Randomiza1on tests These explore whether collec1ng the data differently would affect the results. Usually starts by trea1ng the collected data as representa1ve of the popula1on, and permu1ng it, leaving samples out, or randomly resampling it mul1ple 1mes to see the range of descrip1ve sta1s1cs Get a computa1onal sta1s1cian involved if these ques1ons come up. The tools are available in R to do these kinds of analyses. Keywords: resampling, bootstrap, jackknife

Principles of Popula1on Gene1cs Laws of gene1cs Number of alleles and number of possible genotypes

Popula1ons What is a popula1on? A group of people sharing common ancestry. Usually defined broadly Hardy Weinberg Equilibrium Within a randomly ma1ng popula1on, the genotype frequencies at any single gene1c locus will remain constant. This allows genotype frequencies to be predicted from allele frequencies. (See Punne\ Square.) All human popula1ons deviate (mildly) from HWE and your sta1s1cs will require (mild) correc1ons.

Punne\ Square Father: A p Father: a q Mother: A p AA p 2 Aa pq Mother: a q aa qp aa q 2 AA P 2 Aa 2pq aa q 2 Note the following: p + q = 1.0 The fitness of the alleles (A and a) must be equal in the popula+on. This usually is the result of hybrid vigor, where the heterozygote has an advantage over both homozygotes.

Devia1ons from HWE in Human Popula1ons Finite popula1ons produce random gene1c drir not an issue for popula1ons larger than a small town. Non random ma1ng is not believed to affect the STR loci. Migra1on effects disappear over a period of several genera1ons. Natural selec1on is not believed to affect the STR loci. Muta1on rates at ~0.2%/genera1on are not likely to affect allelic frequencies.

STR Popula1on Database Analyses Popula1on DNA databases Sta1s1cal tests on DNA databases Prac1cal considera1ons

Crea1ng a Popula1on DNA Database Not for amateurs Need >100 samples per local popula1on group ORen uses anonymous samples from a blood bank watch for sampling effects Analysis use appropriate STR kits Determine allele frequencies at each locus note sampling bias issues Check HWE Note the poten1al existence of non interbreeding popula1ons

Sta1s1cal tests on DNA databases There are a number of computer programmes available to evaluate the usefulness of a DNA database. Consider using DNATYPE first of all Need to test for independence of alleles at each gene1c locus and between loci Unfortunately, independence tes1ng does not validate the product rule 8( Compare to other popula1on data sets Watch for popula1on substructure

DNATYPE PowerStats GDA GENEPOP DNA VIEW ARLEQUIN PowerMarker PopStats TFPGA Programmes Available

Prac1cal considera1ons Watch these journals for popula1on data: For the Record ar1cles in Journal of Forensic Science Announcements of Popula1on Data in Forensic Science Interna+onal Understand the numbers reported. Understand why the markers in use have been chosen. Understand what the most common and rarest genotypes are for the DNA markers in use.

Frequency Es1mates, Likelihood Ra1os, and Source A\ribu1on Frequency es1mate calcula1ons Likelihood ra1o Source a\ribu1on Other topics

Frequency Es1mate Calcula1ons Work through a frequency es1mate calcula1on. Take a DNA profile and use the allele frequencies in a popula1on database. A random match probability is not the probability that someone is guilty or that someone else ler the biological material. Understand how rare alleles and tri allelic pa\erns are handled. Understand the product rule Understand the differences between popula1on databases. Understand the impact of popula1on structure Understand the impact of rela1ves.

Likelihood ra1o Prac1ce quan1fying the eviden1ary value of a match between a reference sample, K, and a ques1oned sample, Q Explore likelihood ra1os.

Source a\ribu1on When p x is the random match probability for a profile X, (1 p x ) N is the probability of not observing the par1cular profile in a sample of N unrelated individuals. When this probability is greater than or equal to a confidence level 1 a, then (1 p x ) N >= 1 a or p x <= 1 (1 a) 1/N In the American popula1on, a random match probability (RMP) of 3.35 x 10 11 will confer a 99% confidence that the profile is unique in the popula1on. For the UK, the RMP is 2.01 x 10 10

Other topics DNA database searches mul1ply the RMP by the number of persons in the database to adjust for the possibility of matching that many people. For lineage markers use the count of the profile in the database as an es1mate of its underlying probability in the popula1on and do a frequency es1mate with a confidence interval based on that.

Sta1s1cal Analysis of Mixtures and Degraded DNA Mixture interpreta1on Par1al DNA profiles

Mixture interpreta1on This is nasty, but any truth is be\er than indefinite doubt. The most conserva1ve approach is to judge whether the suspect might be represented by the mixture found in the sample. Some 1mes you can pull apart the alleles, one known person at a 1me. Duplicate alleles among the persons in the mixture are then a problem. When contribu1ons of donors are about equal, you have a serious problem.

Exclusion Probabili1es Use the combined probability of exclusion. This is an es1mate of the propor1on of the popula1on that has at least one allele not observed. The combined probability of exclusion assumes independence and mul1plies the excluded popula1on propor1on at each locus. Vulnerable to non detec1on of alleles Provides a conserva1ve es1mate

Likelihood Ra1o Set up two compe1ng hypotheses The problem is defining the hypotheses is not straighyorward. Uses the evidence be\er than the exclusion method.

Mixtures Complicated to interpret Basic approach is to iden1fy the alleles from known contributors. Any detected alleles outside that set had to come from unknowns (one or more ) When the mixture results are affected by lowcopy number stochas1c limits, degrada1on, or PCR inhibi1on, so that alleles are missing, all bets are off.

Par1al DNA profiles Only loci with results can be interpreted. Degraded samples or low copy number samples will cause PCR to fail. Interpret only the detected alleles Any data are be\er than none at all.

Kinship and Parentage Tes1ng When DNA samples being compared are from related individuals, the assump1on of independence is violated, and different sta1s1cal equa1ons must be applied. Parentage tes1ng Sta1s1cal calcula1ons Impact of muta1onal events Reference samples Reverse parentage tes1ng Data from both parents is oren not available

Conclusions Unfortunately, you re likely to be the expert. If you have the opportunity, study this on your own or do a forensics qualifica1on (post graduate or subject area) You know where to find help. Michael Oakes Peter Dunne Malcolm Farrow Be honest about your level of skill More sta1s1cs won t hurt you.