Detecting ancient admixture using DNA sequence data

Similar documents
Supporting Information

Inconsistencies in Neanderthal Genomic DNA Sequences

Fossils From Vindija Cave, Croatia (38 44 kya) Admixture between Archaic and Modern Humans

Population Genetics II. Bio

Supplementary Methods 2. Supplementary Table 1: Bottleneck modeling estimates 5

Separating Population Structure from Recent Evolutionary History

Possible Ancestral Structure in Human Populations

Possible Ancestral Structure in Human Populations

SUPPLEMENTARY INFORMATION

Themes. Homo erectus. Jin and Su, Nature Reviews Genetics (2000)

N e =20,000 N e =150,000

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

Linkage Disequilibrium. Adele Crane & Angela Taravella

Supplementary Material online Population genomics in Bacteria: A case study of Staphylococcus aureus

Signatures of a population bottleneck can be localised along a recombining chromosome

Chromosome inversions in human populations Maria Bellet Coll

Summary. Introduction

Supplementary Figures

Nature Genetics: doi: /ng.3254

Questions we are addressing. Hardy-Weinberg Theorem

Learning about human population history from ancient and modern genomes

Further confirmation for unknown archaic ancestry in Andaman and South Asia.

HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY

Rethinking Polynesian Origins: Human Settlement of the Pacific

An Introduction to Population Genetics

Linkage disequilibrium extends across putative selected sites in FOXP2

Computational Population Genomics

THE classic model of genetic hitchhiking predicts

REPORT. Complex History of Admixture between Modern Humans and Neandertals. Benjamin Vernot 1, * and Joshua M. Akey 1, *

Computational Workflows for Genome-Wide Association Study: I

More Introduction to Positive Selection

Course: IT and Health

SNP Selection. Outline of Tutorial. Why Do We Need tagsnps? Concepts of tagsnps. LD and haplotype definitions. Haplotype blocks and definitions

Has the Combination of Genetic and Fossil Evidence Solved the Riddle of Modern Human Origins?

Footprints of ancient balanced polymorphisms in genetic variation data. Howard Hughes Medical Institute, University of Chicago

arxiv: v1 [q-bio.pe] 16 Jul 2013

CS262 Computation Genomics Winter 2015 Lecture 15 Human Population Genomics (02/24/2015) Scribed by: Junjie (Jason) Zhu Image Source: Lecture Notes

Analysis of genome-wide genotype data

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012

New Developments in the Genetic Evidence for Modern Human Origins

The date of interbreeding between Neandertals and modern humans

Simple inheritance. Defective Gene. Disease

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Supplementary Online Content

BA, BSc, and MSc Degree Examinations

TITLE: Refugia genomics: Study of the evolution of wild boar through space and time

Inference of Human Population History From Whole Genome Sequence of A Single Individual

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573

Pop Gen meets Quant Gen and other open questions

Human Population Differentiation is Strongly Correlated With Local Recombination Rate

Petar Pajic 1 *, Yen Lung Lin 1 *, Duo Xu 1, Omer Gokcumen 1 Department of Biological Sciences, University at Buffalo, Buffalo, NY.

Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data

Understanding genetic association studies. Peter Kamerman

1 (1) 4 (2) (3) (4) 10

Quantitative Genetics for Using Genetic Diversity

Molecular markers in plant systematics and population biology

On the Power to Detect SNP/Phenotype Association in Candidate Quantitative Trait Loci Genomic Regions: A Simulation Study

Using Aspects of the Site Frequency Spectrum to Determine Demographic History in Ancient and Modern Populations. Melinda Anna Yang

Supplementary Note: Detecting population structure in rare variant data

Introduction to Population Genetics. Spezielle Statistik in der Biomedizin WS 2014/15

Human Population Differentiation Is Strongly Correlated with Local Recombination Rate

CMSC423: Bioinformatic Algorithms, Databases and Tools. Some Genetics

The Date of Interbreeding between Neandertals and Modern Humans

Algorithms for Genetics: Introduction, and sources of variation

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

Park /12. Yudin /19. Li /26. Song /9

Haplotypes, linkage disequilibrium, and the HapMap

POPULATION GENETICS studies the genetic. It includes the study of forces that induce evolution (the

Model based inference of mutation rates and selection strengths in humans and influenza. Daniel Wegmann University of Fribourg

Identifying the functional bases of trait variation in Brassica napus using Associative Transcriptomics

Evolution of Populations

The Red Queen Model of Recombination Hotspots Evolution in the Light of Archaic and Modern Human Genomes

5/18/2017. Genotypic, phenotypic or allelic frequencies each sum to 1. Changes in allele frequencies determine gene pool composition over generations

Population Genetic Differentiation and Diversity of the Blue Crab in the Gulf of Mexico Inferred with Microsatellites and SNPs

Lecture 21: Association Studies and Signatures of Selection. November 6, 2006

Genome-wide analyses in admixed populations: Challenges and opportunities

Haplotype Identification through DNA Sequencing Processes

recombination rates across human populations Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA

Overview of using molecular markers to detect selection

Genome-Wide Association Studies (GWAS): Computational Them

ARTICLE Contrasting X-Linked and Autosomal Diversity across 14 Human Populations

Units 4: How Does Life Change and Respond to Challenges Over Time?

Natural selection and the distribution of Identity By. Descent in the human genome

The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome

Association studies (Linkage disequilibrium)

Evolutionary Genetics: Part 1 Polymorphism in DNA

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Population Genetics of CAPN10 and GPR35: Implications for the Evolution of Type 2 Diabetes Variants

Genome-Wide Association Studies. Ryan Collins, Gerissa Fowler, Sean Gamberg, Josselyn Hudasek & Victoria Mackey

UHT Sequencing Course Large-scale genotyping. Christian Iseli January 2009

Neandertal toe contains human DNA Analyses show humans and Neandertals mixed and mated far earlier than previously thought

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

Modelling genes: mathematical and statistical challenges in genomics

UNIVERSITY OF YORK BA, BSc, and MSc Degree Examinations

Tracing Your Matrilineal Ancestry: Mitochondrial DNA PCR and Sequencing

Can a Sex-Biased Human Demography Account for the Reduced Effective Population Size of Chromosome X in Non-Africans?

HAPLOTYPE BLOCKS AND LINKAGE DISEQUILIBRIUM IN THE HUMAN GENOME

Natural Selection Reduced Diversity on Human Y Chromosomes

Drupal.behaviors.print = function(context) {window.print();window.close();}>

S1 Text: Supplementary Methods

Transcription:

Detecting ancient admixture using DNA sequence data October 10, 2008 Jeff Wall Institute for Human Genetics UCSF

Background

Origin of genus Homo 2 2.5 Mya

Out of Africa (part I)?? 1.6 1.8 Mya

Further spread of Homo erectus 0.8 1.0 Mya

Origin of modern humans 150 200 Kya

Out of Africa (part II) ~ 100 Kya

Re-colonization of Eurasia 40 60 Kya

Colonization of the New World 15 30 Kya

Modeling human demography Asia Africa Europe time???? present

Neandertal skull

The hobbit (Homo floresiensis)

Modern Human Origins What contribution (if any) did archaic humans (e.g., Neanderthals or Homo floresiensis) make to the modern gene pool? How can we answer this from patterns of DNA sequence variation?

Two approaches Indirect approach Examine patterns of genetic variation in extant humans to look for evidence of ancient admixture Direct approach Compare recently recovered Neandertal (or other archaic human) DNA sequences with orthologous modern human sequences

Two approaches Indirect approach Examine patterns of genetic variation in extant humans to look for evidence of ancient admixture Direct approach Compare recently recovered Neandertal (or other archaic human) DNA sequences with orthologous modern human sequences

Neandertal mtdna Mitochondrial DNA from several different Neandertal fossils have been recovered. All Neandertal mtdna sequences are quite different from (and outside the range of) modern human mtdna variation. These results are consistent with admixture rates (i.e., contribution of Neandertals to the modern European gene pool) of 0 20 %. Serre et al. (2004)

Neandertal nuclear DNA Recent studies have also recovered Neandertal DNA from random nuclear regions (Green et al. 2006; Noonan et al. 2006) and targeted genes (Krause et al. 2007; Lalueza-Fox et al. 2007). There is even a Neandertal genome project, which seeks to obtain a complete (though composite) Neandertal genome sequence.

Neandertal nuclear DNA However, contamination with modern human DNA is a serious concern, and so far 2 out of 4 existing studies have been effectively debunked (Wall and Kim 2007; Coop et al. 2008). Analyses of existing nuclear data are also consistent with Neandertal admixture rates of 0 20 % (Noonan et al. 2006).

Neanderthal sequencing Two recent studies (Green et al. Nature 2006; Noonan et al. Science 2006) published nuclear DNA sequences obtained from a 38,000 year old Neandertal fossil. However, the two studies came to completely different conclusions!

Comparing the two studies We used the methods described in Noonan et al. (2006) to analyze the two data sets. This ensures that the results are directly comparable. Wall & Kim (2007), PLoS Genetics

The data For all sites where the Neandertal and human sequence differ, we tabulate Neandertal allele (ancestral vs. derived) Human reference sequence allele (ancestral vs. derived) Frequency of derived allele in Caucasian HapMap sample (if available)

Simple admixture model time Modern humans Neandertals?? 30 40 Kya present

Methods overview The likelihood of each particular configuration depends on model parameters, including the admixture proportion c. We use a composite likelihood (assuming informative sites are independent) to estimate parameters.

Conflicting results Human-Neandertal DNA sequence divergence time Modern European- Neandertal population split time Neandertal contribution to modern European ancestry Noonan et al. (2006) 706 Kya (466 1028 Kya) 325 Kya (135 557 Kya) 0 % (0 39%) Green et al. (2006) 560 Kya (509 615 Kya) 35 Kya (33 51 Kya) 94% (81 100%)

Human branch divergence What is the human-specific branch divergence for the two data sets? chimpanzee Neandertal Human

Human branch divergence Data Human branch div. (sites / Kb) Noonan 7.43 Green 5.89

Human branch divergence Data Human branch div. (sites / Kb) Noonan 7.43 Green 5.89 Small fragments 7.32 Med. fragments 5.97 Large fragments 5.22

Human branch divergence Data Human branch div. (sites / Kb) Noonan 7.43 Green 5.89 Small fragments 7.32 Med. fragments 5.97 Large fragments 5.22 Human data 3.95 6.10

Neandertal derived alleles (simulations) We also tabulated the fraction of HapMap SNPs for which the Neandertal sequence has the derived allele: Neandertal derived proportion 25 20 15 10 5 0 Split 150 Kya Split 325 Kya Split 550 Kya 0 0.1 0.2 0.3 Neanderthal admixture proportion

Neandertal derived alleles (data) We also tabulated the fraction of HapMap SNPs for which the Neandertal sequence has the derived allele: Data: Proportion Noonan 3.1 Green 32.9 Small fragments 21.8 Med. fragments 32.7 Large fragments 37.2 Human ref. sequence 37.0

Conclusions The most likely explanation is widespread contamination with modern human DNA in the Green et al. (2006) study.

Two approaches Indirect approach Examine patterns of genetic variation in extant humans to look for evidence of ancient admixture Direct approach Compare recently recovered Neanderthal DNA sequences with orthologous modern human sequences

General overview Start with resequencing data from multiple human populations Use frequency spectrum data to estimate demographic parameters (assuming no admixture) Examine model fit using summaries of linkage disequilibrium (LD) Use a specially constructed measure of LD to estimate levels of ancient admixture

Simple admixture model time Modern humans Archaic humans?? 30 40 Kya present

Admixture mapping Modern human DNA Neandertal DNA

Admixture mapping Modern human DNA Neandertal DNA

Admixture mapping Modern human DNA Neandertal DNA

Admixture mapping Modern human DNA Neandertal DNA

Genealogy with archaic ancestry time Modern humans Archaic humans present

Genealogy without archaic ancestry time Modern humans Archaic humans present

Our main questions What pattern does archaic ancestry produce in DNA sequence polymorphism data (from extant humans)? How can we use data to estimate the contribution of archaic humans to the modern gene pool (c)? test whether c > 0?

Genealogy with archaic ancestry (Mutations added) time Modern humans Archaic humans present

Genealogy with archaic ancestry (Mutations added) time Modern humans Archaic humans present

Patterns in DNA sequence data Sequence 1 Sequence 2 Sequence 3 Sequence 4 Sequence 5 Sequence 6 Sequence 7 A T C C A C A G C T G A G C C A C G G C T G T G C G G T A A C C T A G C C A C A G C T G T G T G G T A A C C T A G C C A T A G A T G A G C C A T A G A T G

Patterns in DNA sequence data Sequence 1 Sequence 2 Sequence 3 Sequence 4 Sequence 5 Sequence 6 Sequence 7 A T C C A C A G C T G A G C C A C G G C T G T G C G G T A A C C T A G C C A C A G C T G T G T G G T A A C C T A G C C A T A G A T G A G C C A T A G A T G We call the sites in red congruent sites these are sites inferred to be on the same branch of an unrooted tree.

Measuring congruence To measure the level of congruence in SNP data from larger regions we define a score function S* = max I {1,2,... n} S( I) where S (i 1,... i k ) = k 1 j = 1 S ( i j, i ) j + 1 and S (i j, i j+1 ) is a function of both congruence (or near congruence) and physical distance between i j and i j+1.

An example

An example

S* is sensitive to ancient admixture

A more realistic null model

Existing data To test our methods, we analyzed data from the Environmental Genome Project (from D. Nickerson s laboratory at the Univ. of Washington). We analyzed data from 222 genes in 12 Yoruba and 22 European-Americans. In all there was over 5 Mb of sequence data and over 20,000 SNPs. Plagnol and Wall 2006, PLoS Genetics; Plagnol et al., unpublished results

Demographic null model We used a summary likelihood approach to estimate basic demographic parameters, including Time of split between African and European populations Timing and strength of bottleneck in European populations Timing of recent population growth in both populations Migration rate between the two populations These parameters were then used to construct a demographic null model (without ancient admixture).

Demographic null model Neanderthal split 400 Kya Divergence 100 Kya 20 Kya 160-fold bottleneck 25 Kya Admixture 50 Kya Migration rate 7 * 10-5

Fit of the null model How well does the demographic null model fit the patterns of genetic variation found in the actual data?

Fit of the null model How well does the demographic null model fit the patterns of genetic variation found in the actual data? Quite well. The model accurately reproduces both parameters used in the original fitting (e.g., Tajima s D in each population) as well as other aspects of the data (e.g., estimates of ρ = 4Nr)

General approach Is our null model sufficient to explain the patterns of LD in the data? We test this by comparing the observed S* values with the distribution of S* values calculated from data simulated under the null model.

An example (CHRNA4) How often is S* from simulations greater than or equal to the S* value from the actual data?

An example (CHRNA4) How often is S* from simulations greater than or equal to the S* value from the actual data? p = 0.025

S* for the NIEHS data (European) Q Q plot p <5. * 10-3

S* for the NIEHS data (African) Q Q plot p <1. * 10-7

Source populations We have found some evidence (p < 5 * 10-3 ) for ancient admixture in European, East Asian and West African populations. While Neandertals form an obvious archaic source population candidate in Europe, there is not yet a clear source population candidate in West Africa.

Admixture levels We examine profile likelihood curves for c using the summary likelihood method described before.

Admixture levels in Europe We examine profile likelihood curves for c using the summary likelihood method described before. c relative log-likelihood 0-2 -4-6 0 0.05 0.1 0.15 0.2

Admixture levels in Asia The same type of analysis can be done for putative admixture with Homo erectus in Asia. c relative log-likelihood 0-2 -4-6 0 0.01 0.02 0.03

Conclusions Using patterns of LD, there seems to be evidence of ancient admixture in all populations studied Note that this cannot necessarily be tied to specific fossil groups, such as Neandertals These results may reflect ancestral structure predating modern humans expansion out of Africa

Acknowledgments People: Vincent Plagnol Kirk Lohmuller Sung Kim Data: Environmental Genome Project Rubin lab Pääbo lab Funding: National Science Foundation (HOMINID) Alfred Sloan Foundation