SUPPLEMENTARY INFORMATION

Similar documents
An early modern human from Romania with a recent Neanderthal ancestor

Fossils From Vindija Cave, Croatia (38 44 kya) Admixture between Archaic and Modern Humans

Detecting ancient admixture using DNA sequence data

Inconsistencies in Neanderthal Genomic DNA Sequences

LETTER. An early modern human from Romania with a recent Neanderthal ancestor

Supplementary information ATLAS

Overview One of the promises of studies of human genetic variation is to learn about human history and also to learn about natural selection.

A mitochondrial genome sequence of a hominin from Sima de los Huesos

Supporting Information

File S1 Technical details of a SNP array optimized for population genetics

Learning about human population history from ancient and modern genomes

SUPPLEMENTARY INFORMATION

Quantifying and reducing spurious alignments for the analysis of ultra-short ancient DNA sequences

Genome 373: Mapping Short Sequence Reads II. Doug Fowler

The Red Queen Model of Recombination Hotspots Evolution in the Light of Archaic and Modern Human Genomes

Letter. The genome of the offspring of a Neanderthal mother and a Denisovan father

Supplementary Figures

Addressing Challenges of Ancient DNA Sequence Data Obtained with Next Generation Methods

Supplementary Information for The ratio of human X chromosome to autosome diversity

Supplementary Figures

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

Course summary. Today. PCR Polymerase chain reaction. Obtaining molecular data. Sequencing. DNA sequencing. Genome Projects.

Supplemental Figures Supplemental Figure 1.

After cell death, DNA steadily decays. If microbes do

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line

Supplementary Material online Population genomics in Bacteria: A case study of Staphylococcus aureus

SUPPLEMENTARY INFORMATION

Supplementary Figure 1 Schematic view of phasing approach. A sequence-based schematic view of the serial compartmentalization approach.

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang

REPORT. Complex History of Admixture between Modern Humans and Neandertals. Benjamin Vernot 1, * and Joshua M. Akey 1, *

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

ESTIMATING GENETIC VARIABILITY WITH RESTRICTION ENDONUCLEASES RICHARD R. HUDSON1

CS262 Computation Genomics Winter 2015 Lecture 15 Human Population Genomics (02/24/2015) Scribed by: Junjie (Jason) Zhu Image Source: Lecture Notes

HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY

Genome-Wide Survey of MicroRNA - Transcription Factor Feed-Forward Regulatory Circuits in Human. Supporting Information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Read Complexity

Whole Human Genome Sequencing Report This is a technical summary report for PG DNA

Lecture 12. Genomics. Mapping. Definition Species sequencing ESTs. Why? Types of mapping Markers p & Types

Understanding Accuracy in SMRT Sequencing

Nature Genetics: doi: /ng.3254

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Genetics 101. Prepared by: James J. Messina, Ph.D., CCMHC, NCC, DCMHS Assistant Professor, Troy University, Tampa Bay Site

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next.

Mate-pair library data improves genome assembly

Population Genetics, Systematics and Conservation of Endangered Species

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

BST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data

Introduction to Algorithms in Computational Biology Lecture 1

Separating Population Structure from Recent Evolutionary History

Rev. Cell Biol. Mol. Medicine Paleogenomics 243

Supporting Online Material for

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

SUPPLEMENTARY INFORMATION

Proceedings of the World Congress on Genetics Applied to Livestock Production,

1 (1) 4 (2) (3) (4) 10

Deleterious mutations

Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA

Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

Genomic resources. for non-model systems

Lesson Overview DNA Replication

Petar Pajic 1 *, Yen Lung Lin 1 *, Duo Xu 1, Omer Gokcumen 1 Department of Biological Sciences, University at Buffalo, Buffalo, NY.

SUPPLEMENTARY FIGURES AND TABLES. Exploration of mirna families for hypotheses generation

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

Obtaining DNA from degraded samples for NGS sequencing

ChIP-seq and RNA-seq. Farhat Habib

Supporting Online Material for

Supplementary Figure 1

More often heard about on television dramas than on the news, DNA is the key to solving crimes the scientific way. Although it has only been

Thecompletegenomesequenceofa Neanderthal from the Altai Mountains

BIOINFORMATICS ORIGINAL PAPER

Title: A high-coverage Neandertal genome from Vindija Cave in Croatia

Ancient DNA from pre-columbian South America

Chapter 17. PCR the polymerase chain reaction and its many uses. Prepared by Woojoo Choi

Toward high-resolution population genomics using archaeological samples

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

SUPPLEMENTARY INFORMATION

APPLICATION NOTE

What Are the Chemical Structures and Functions of Nucleic Acids?

The complete genome sequence of a Neandertal from the Altai Mountains

Parts of a standard FastQC report

Whole genome sequencing in the UK Biobank

Genetic Identification of Ancient Korean Remains

Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding

12/8/09 Comp 590/Comp Fall

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

ANTH 6491: Anthropological Genetics Wednesdays pm, Purple Seminar Room, 6 th floor Science & Engineering Hall Fall 2015

Comparison and Evaluation of Cotton SNPs Developed by Transcriptome, Genome Reduction on Restriction Site Conservation and RAD-based Sequencing

Systematic evaluation of spliced alignment programs for RNA- seq data

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

DNA METHYLATION RESEARCH TOOLS

UHT Sequencing Course Large-scale genotyping. Christian Iseli January 2009

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

SUPPLEMENTARY INFORMATION

Introduction to Short Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

Supplementary Methods 2. Supplementary Table 1: Bottleneck modeling estimates 5

SUPPLEMENTARY INFORMATION

Unit 2: Biological basis of life, heredity, and genetics

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics

Transcription:

doi:10.1038/nature17405 Supplementary Information 1 Determining a suitable lower size-cutoff for sequence alignments to the nuclear genome Analyses of nuclear DNA sequences from archaic genomes have until now been restricted to fragments of length 35bp or longer. However, to maximize the yield of DNA fragments we have used fragments as short as 30bp when reconstructing mtdna genomes from the SH specimens. Lowering the size-cutoff for the analysis of the nuclear genome of the SH specimens increases the number of informative sequences available for analysis, but also increases the risk of including incorrectly aligned microbial sequences that have spurious similarity to the much larger nuclear genome. The problem of spurious alignments becomes apparent when plotting the percentage of sequences that map to the human genome as a function of their length (Supplementary Figure 1). With the alignment software and parameters chosen here and in previous studies 1,2 (BWA 3 with options -n 0.01 o 2 l 16500 ), we observe a steep increase in the success of mapping for sequences shorter than 35bp. However, this is driven largely by alignments that include mismatches. The fraction of alignments without mismatches only increases at a much shorter length of around 25bps. Since neither evolutionary differences (approximately 1 in 700bp 1 ), nor deamination, nor sequencing error would reduce the fraction of perfectly aligning sequences in a manner that is dependent on sequence length to the extent observed here, we deem spurious alignments the best explanation for the difference between perfect matches and alignments allowing mismatches. To quantify the proportion of spurious alignments with size cutoffs of 30 and 35bp we identified 11,299 sites where the human reference genome differs from all present-day humans in the 1000 genomes project phase I data 4 as well as from the chimpanzee genome (pantro2) 5. We expect that nearly 100% of endogenous hominin sequences overlapping these sites, which represent errors in the human reference genome sequence or rare polymorphisms observed with ~0.1% frequency in presentday humans, will carry the state of other humans and the chimpanzee (i.e. the ancestral state). In contrast, less than ~3% of evolutionarily unrelated microbial sequences are expected to do so due to random similarity to the human reference genome (Supplementary Figure 2). By counting how often DNA fragments overlapping these positions share the derived state we obtained an approximate estimate of the proportion of spurious alignments. To minimize the impact of cytosine deamination in this analysis, we disregarded alignments if either the ancestral or the derived state at a given position was C in the orientation sequenced, or G if the complementary strand was sequenced. Using this WWW.NATURE.COM/NATURE 1

approach (Supplementary Figure 2), we estimate that at a fragment length cut-off of 30bp between 9% and 67% of all aligned DNA fragments from the five specimens are likely of microbial origin. When we increase the fragment length cut-off to 35bp we find no evidence for spurious microbial alignments although the numbers are admittedly small (Supplementary Table 1). Supplementary Table 1: The proportion of sequences aligned randomly to the human genome, estimated as 1 minus the frequency of sharing the ancestral state, using lower size cut-offs of 30 and 35bp. 95% binomial confidence intervals are provided in brackets. All sequences used in this analysis aligned to the genome with a map quality score 30. WWW.NATURE.COM/NATURE 2

Supplementary Figure 1: Percentage of sequences from SH femur XIII that map to the human reference genome when mismatches are allowed (solid line) or perfect matches required (dashed line) as a function of their size. The alignment parameters chosen here decrease the number of allowed mismatch by one for reads shorter than 22bp, leading to the observed dip in the percentage of mapped sequences. WWW.NATURE.COM/NATURE 3

Supplementary Figure 2: Probability of sharing rare variants with the human reference genome. These variants (green) are identified by requiring that all of the present-day human genomes sequenced in phase I of the 1000 genomes project as well as the chimpanzee differ from the reference genome, i.e. share the ancestral state (red). With the parameters chosen for mapping, the number of mismatches between a DNA fragment and the reference is limited to 3 for fragments 41bp, to 4 for fragments >=42bp, and to 5 for fragments >=64bp, etc., i.e. less than 10% of the bases are allowed differ from the reference genome in any given alignment. Given that differences to the reference genome can be due to three different bases, less than 3.3% of spuriously aligned microbial DNA fragments are expected to share the ancestral state at these positions. In contrast, nearly all (>99.9%) hominin sequences are expected to be ancestral at these positions. WWW.NATURE.COM/NATURE 4

Supplementary Information 2 Using conditional substitution frequencies for detecting and quantifying ancient DNA in mixtures with recent contamination Library-based sample preparation techniques in combination with high-throughput sequencing have revealed that authentic ancient DNA sequences show a distinct elevation of C to T substitutions close to their ends (or G to A substitutions depending on the method used for library preparation) 6. These substitutions, which arise due to cytosine deamination, accumulate predominantly at molecule ends and are much less frequent in contaminant DNA that was introduced during or after excavation of fossils 7. Elevated frequencies of damage-induced substitutions can thus be used to provide evidence for the presence of endogenous ancient DNA. Although there is no strict linear relationship between the strength of deamination signals and the age of specimens 8, nearly all samples that are thousands of years in age have been reported to show terminal C to T substitution frequencies exceeding 10% (see for example ref. 9 and 10). Some samples in which authentic ancient DNA may still be present, including some of the SH specimens described here, exhibit lower frequencies of C to T substitutions, not compatible with the presumed age of the sequences due to an overwhelming background of presentday contamination that dilutes the deamination signal. To increase our ability to detect ancient DNA in highly contaminated samples based on patterns of DNA damage, we have recently introduced the concept of conditional substitution frequencies in the analysis of mtdna sequences of SH femur XIII 11. Conditional substitution frequencies are obtained by isolating sequences that show a C to T difference to the reference at their 5 ends and determining the frequency of C to T substitutions at their 3 ends, and vice versa. In the case of femur XIII, C to T substitution frequencies increased from 12 and 17% at the 5 and 3 ends of all sequences to 55 and 62% in the population of sequences with a C to T substitution at the other end. The latter numbers are close to the deamination signals detected in a similar-age cave bear from the same site, suggesting that they might be a good proxy for the deamination frequencies of the fraction of endogenous DNA fragments in the specimen. However, it is also conceivable that decay processes leading to deamination at one end of a molecule may increase the likelihood of deamination at the other end, thereby reducing the power to detect and quantify mixtures of ancient and contaminant DNA based on conditional substitution frequencies. WWW.NATURE.COM/NATURE 5

To test whether a population of highly deaminated molecules may also be present among contaminant DNA fragments, we analyzed published mtdna sequences from two ancient hominin specimens, SH femur XIII 11 and a Late Pleistocene Neanderthal (Mezmaiskaya 1) 12, that show high levels of modern human contamination. Using diagnostic positions that differentiate the mtdna genomes of 311 present-day humans from the two archaic individuals (132 positions for femur XIII and 41 for Mezmaiskaya 1), we separated contaminant and endogenous sequences based on their sharing of either the human-derived or the ancestral state. To minimize erroneous assignments due to deamination we disregarded sequences that overlap a diagnostic site with their first or last 3 positions. We then determined the frequencies of C to T substitutions in both sets of sequences, both with and without conditioning on the presence of a C to T substitution on the other end (Supplementary Table 2). As expected for mixtures of ancient and contaminant sequences, conditional substitution frequencies increase compared to regular substitution frequencies prior to separation of the sequences. In the subset of contaminant sequences, conditional substitution frequencies increase slightly in femur XIII but remain well below 10%, providing little evidence for the presence of a highly deaminated population of DNA fragments among the contaminant DNA in the two specimens. To test whether deamination occurs independently on both ends of molecules we determined regular and conditional substitution frequencies using larger data sets of nuclear sequences from six archaic individuals 1,2,13 that have been estimated to contain negligible levels of modern human contamination (1% or less). Regular and conditional substitution frequencies are expected to be equal if deamination occurs independently at either end of the molecules. However, conditional substitution frequencies are between 10% lower and 14% higher than regular substitution frequencies, indicating that deamination frequencies at the ends are not entirely independent (Supplementary Table 3). A reduction in conditional substitution frequencies in same samples may be explained by mapping bias, i.e. the difficulty of mapping sequences with 2 terminal C to T substitutions and possibly other differences to the reference genome. Comparisons of regular and conditional substitution frequencies therefore do not allow for accurate estimation of modern human contamination, but they can be used to detect the presence of small amounts of ancient sequences in a background of human contamination, and provide rough estimates of contamination in heavily contaminated samples. When applied to the nuclear DNA sequences from the Sima de los Huesos specimens, we estimate that modern human DNA contamination is present in at least a 2.7-fold excess in all libraries, contributing 63% or more of all sequences (Supplementary Table 3). To minimize the impact of contamination on our analyses, we therefore restricted down-stream analyses to only those fragments showing evidence of deamination. WWW.NATURE.COM/NATURE 6

Supplementary Table 2: C to T substitution frequencies and conditional substitution frequencies in mtdna sequences from two archaic individuals. Diagnostic sites that differentiate the respective archaic individual from 311 present-day humans were used to identify subsets of sequences that are putatively of endogenous origin or from recent human contamination. The percentage of sequences identified as contamination from diagnostic sites is provided in the second column. Specimen SH femur XIII 714,601 sequences Human contamination based on diagnostic Population of sites [%] 94.9 All sequences 6.2 (9,406/151,703) C to T substitution frequency [%] (observations) ------------ Conditional ---------- sequences 5 end 3 end 5 end 3 end 9.0 46.3 (10,099/112,213) (1081/2,335) Human contamination 1.6 (532/33,269) 2.3 (539/23,453) 3.8 (5/132) 54.7 (1,082/1,978) 5.2 (5/96) Endogenous 59.8 (1,354/2,264) 54.2 (1,004/1,852) 56.5 (152/269) 67.0 (152/227) Mezmaiskaya 1 (library B9687) 43.3% All sequences 8.1 (512/6,320) 13.7 (723/5,277) 15.5 (28/181) 25.9 (28/108) 28,955 sequences Human contamination 0.9 (5/572) 0.7 (4/550) 0.0 (0/1) 0.0 (0/1) Endogenous 13.5 (93/690) 18.3 (104/568) 15.6 (5/32) 16.7 (5/30) WWW.NATURE.COM/NATURE 7

Supplementary Table 3: Damage-induced substitution frequencies and conditional substitution frequencies in 6 archaic genomes and the SH libraries sequenced in the present study. Substitution frequencies at the 3 ends of sequences in the Vindija and Mezmaiskaya samples correspond to G to A substitutions, as the respective libraries were generated using a double-stranded method. Rough contamination estimates based on the comparison of the two substitution frequencies are provided in the final column. Specimen C to T (or G to A) substitution frequency [%] Conditional 5 end 3 end 5 end 3 end Conditional / regular substitution frequencies Contamination estimate based on deamination Altai Neanderthal (chromosome 21 only) 6.9 26.4 8.0 30.0 1.14 12.4 Denisova manual phalanx (chromosome 21 only) 8.2 33.5 8.6 34.4 1.03 3.0 Vindija 33.16 (all sequences) 41.9 41.5 37.6 37.2 0.90-11.5 Vindija 33.25 (all sequences) 38.7 38.4 35.0 34.7 0.90-10.6 Vindija 33.26 (all sequences) 37.2 37.6 34.0 34.4 0.91-9.4 Mezmaiskaya 1 (all sequences) 3.7 3.8 4.0 4.0 1.07 6.3 SH Femur XIII (AT2944) 5.3 8.2 32.6 41.9 5.52 81.9 SH Incisor (AT-5482) 12.9 15.9 60.4 58.5 4.13 75.8 SH Femur frag. (AT-5431) 21.5 22.3 63.6 55.8 2.73 63.3 SH Molar (AT-5444) 10.6 11.8 63.1 53.4 5.20 80.8 SH Scapula (AT-6672) 0.6 1 4.4 10 9.00 88.9 WWW.NATURE.COM/NATURE 8

Supplementary Information 3 Estimating present-day human contamination based on the sharing of alleles that have risen to high frequency in modern humans since the split from the archaic hominins The use of diagnostic sites, i.e. sites that show fixed differences between the hominin group from which the sequence comes and the potential contaminant group, is well established for ancient mtdna analysis. Here, we extend this strategy to the autosomes. We identified 28,552 diagnostic positions where all present-day human genomes sequenced in phase I of the 1000 genomes project differ from the high-coverage genome sequences of the Denisovan individual, the Altai Neanderthal and the chimpanzee, bonobo, gorilla, orangutan and rhesus macaque. The states of the outgroups were retrieved from the Ensembl Compara EPO 6 primate whole genome alignments 14,15. All sites were required to be located within unique regions of the human genome as defined by the map35_50% criteria described in Prüfer et al. 2014 (Supplementary Section 5b) 2. Sequences from the SH specimens showing the derived, i.e. the human state at such sites are either due to present-day human contamination, variants segregating in the archaic populations that are not present in any of the four archaic chromosomes used in this analysis, or sequencing error. The percentage of sequences sharing the derived state can therefore be used to estimate an upper limit on the present-day human contamination present in the libraries. To test the suitability of this approach we first performed the analysis using low coverage sequence data from four Neanderthal individuals that were previously determined to contain less than 1% human contamination 2,13. To reduce errors due to cytosine-deamination, we masked out thymines in the first three positions and adenines in the last three to account for the occurrence of damage-derived C to T substitutions at the 5 end and G to A substitutions at the 3 end. Based on the percentage of sequences sharing the derived state our contamination estimate for these samples is between 6.3 and 8.6% (Supplementary Table 4). Because this approach measures not only contamination, but also unknown variation in the archaic hominins and sequencing error, these estimates are higher than the actual contamination estimates but allow us to exclude contamination above the 10% level. We thus performed the same analysis for the SH sequences, this time masking out only thymines within the first and last three sequence positions as single-stranded library preparation fully retains the strand information of the sequenced molecules and does not induce artificial G to A substitutions. Estimates of contamination are high in all SH libraries (78% or more), but drop to 20% for incisor AT-5482 and 0% for WWW.NATURE.COM/NATURE 9

both femur AT-5431 and molar AT-5444 after filtering for putatively deaminated sequences by requiring a C to T substitution at the 5 or 3 terminus. Since the confidence intervals of our estimates are extremely wide due to the limited data, we tried to further increase the power of the analysis by considering sites (totaling 128,004) where the derived allele has risen to 90% or more frequency in modern humans. While providing more power, this analysis may slightly under-estimate contamination as some contaminant sequences may share the archaic state. Overall, contamination estimates are similar to the first estimates but with narrower confidence intervals (Supplementary Table 4). WWW.NATURE.COM/NATURE 10

Supplementary Table 4: Nuclear contamination estimates based on sites that are derived in all, or more than 90%, of present-day humans and ancestral in two Denisovan and two Neanderthal chromosomes. 95% binomial confidence intervals are provided in brackets. Specimen Femur XIII Library A2021 Incisor AT-5482 Femur fragment AT-5431 Molar AT-5444 Scapula AT-6672 Filtered for sequences with terminal C to T substitution Derived allele frequency in present-day humans ---------------------------1.0--------------------------- ---------------------------0.90------------------------ #sites #presentday human state - 90 73 + 3 3-417 350 + 5 1-116 79 + 5 0-30 23 + 1 0-82 76 Present-day human contamination [%] (95% C.I.) 81.1 (71.5-88.6) 100.0 (29.2-100.0) 83.9 (80.0 87.3) 20.0 (0.5-71.6) 68.1 (58.8 76.4) 0.0 (0.0-52.2) 76.7 (57.7-90.1) 0.0 (0.0-97.5) 92.7 (84.8-97.3) #sites #present-day human state 359 314 8 5 1,811 1,552 24 5 527 401 22 4 166 146 5 0 331 323 + 0 0 N/A 1 1 Vindija 33.16-7,838 566 Vindija 33.25-8,237 497 Vindija 33.26-7,274 459 Mezmaiskaya 1-13,068 1,118 7.2 (6.7-7.8) 6.0 (5.5-6.6) 6.3 (5.8-6.9) 8.6 (8.1-9.0) 36,470 1,931 38,037 1,676 33,972 1,528 59,783 4,100 Present-day human contamination [%] (95% C.I.) 87.5 (83.6-90.7) 62.5 (24.5-91.5) 85.7 (84.0-87.3) 20.8 (7.1-42.2) 76.1 (72.2-79.7) 18.2 (5.2-40.3) 88.0 (82.0-92.5) 0.0 (0.0-52.2) 97.6 (95.3-99.0) 100.0 (2.5-100.0) 5.3 (5.1-5.5) 4.4 (4.2-4.6) 4.5 (4.3-4.7) 6.9 (6.7-7.1) WWW.NATURE.COM/NATURE 11

References 1. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222-6 (2012). 2. Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43-9 (2014). 3. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-60 (2009). 4. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061-73 (2010). 5. The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69-87 (2005). 6. Briggs, A.W. et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci U S A 104, 14616-21 (2007). 7. Krause, J. et al. A complete mtdna genome of an early modern human from Kostenki, Russia. Curr Biol 20, 231-6 (2010). 8. Sawyer, S., Krause, J., Guschanski, K., Savolainen, V. & Paabo, S. Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS One 7, e34131 (2012). 9. Gamba, C. et al. Genome flux and stasis in a five millennium transect of European prehistory. Nat Commun 5, 5257 (2014). 10. Allentoft, M.E. et al. Population genomics of Bronze Age Eurasia. Nature 522, 167-72 (2015). 11. Meyer, M. et al. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505, 403-6 (2014). 12. Gansauge, M.T. & Meyer, M. Selective enrichment of damaged DNA molecules for ancient genome sequencing. Genome Res 24, 1543-9 (2014). 13. Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, 710-22 (2010). 14. Paten, B., Herrero, J., Beal, K., Fitzgerald, S. & Birney, E. Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res 18, 1814-28 (2008). 15. Paten, B. et al. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res 18, 1829-43 (2008). WWW.NATURE.COM/NATURE 12