SUPPLEMENTARY INFORMATION
|
|
- Ezra Nicholson
- 6 years ago
- Views:
Transcription
1 doi: /nature17405 Supplementary Information 1 Determining a suitable lower size-cutoff for sequence alignments to the nuclear genome Analyses of nuclear DNA sequences from archaic genomes have until now been restricted to fragments of length 35bp or longer. However, to maximize the yield of DNA fragments we have used fragments as short as 30bp when reconstructing mtdna genomes from the SH specimens. Lowering the size-cutoff for the analysis of the nuclear genome of the SH specimens increases the number of informative sequences available for analysis, but also increases the risk of including incorrectly aligned microbial sequences that have spurious similarity to the much larger nuclear genome. The problem of spurious alignments becomes apparent when plotting the percentage of sequences that map to the human genome as a function of their length (Supplementary Figure 1). With the alignment software and parameters chosen here and in previous studies 1,2 (BWA 3 with options -n 0.01 o 2 l ), we observe a steep increase in the success of mapping for sequences shorter than 35bp. However, this is driven largely by alignments that include mismatches. The fraction of alignments without mismatches only increases at a much shorter length of around 25bps. Since neither evolutionary differences (approximately 1 in 700bp 1 ), nor deamination, nor sequencing error would reduce the fraction of perfectly aligning sequences in a manner that is dependent on sequence length to the extent observed here, we deem spurious alignments the best explanation for the difference between perfect matches and alignments allowing mismatches. To quantify the proportion of spurious alignments with size cutoffs of 30 and 35bp we identified 11,299 sites where the human reference genome differs from all present-day humans in the 1000 genomes project phase I data 4 as well as from the chimpanzee genome (pantro2) 5. We expect that nearly 100% of endogenous hominin sequences overlapping these sites, which represent errors in the human reference genome sequence or rare polymorphisms observed with ~0.1% frequency in presentday humans, will carry the state of other humans and the chimpanzee (i.e. the ancestral state). In contrast, less than ~3% of evolutionarily unrelated microbial sequences are expected to do so due to random similarity to the human reference genome (Supplementary Figure 2). By counting how often DNA fragments overlapping these positions share the derived state we obtained an approximate estimate of the proportion of spurious alignments. To minimize the impact of cytosine deamination in this analysis, we disregarded alignments if either the ancestral or the derived state at a given position was C in the orientation sequenced, or G if the complementary strand was sequenced. Using this 1
2 approach (Supplementary Figure 2), we estimate that at a fragment length cut-off of 30bp between 9% and 67% of all aligned DNA fragments from the five specimens are likely of microbial origin. When we increase the fragment length cut-off to 35bp we find no evidence for spurious microbial alignments although the numbers are admittedly small (Supplementary Table 1). Supplementary Table 1: The proportion of sequences aligned randomly to the human genome, estimated as 1 minus the frequency of sharing the ancestral state, using lower size cut-offs of 30 and 35bp. 95% binomial confidence intervals are provided in brackets. All sequences used in this analysis aligned to the genome with a map quality score
3 Supplementary Figure 1: Percentage of sequences from SH femur XIII that map to the human reference genome when mismatches are allowed (solid line) or perfect matches required (dashed line) as a function of their size. The alignment parameters chosen here decrease the number of allowed mismatch by one for reads shorter than 22bp, leading to the observed dip in the percentage of mapped sequences. 3
4 Supplementary Figure 2: Probability of sharing rare variants with the human reference genome. These variants (green) are identified by requiring that all of the present-day human genomes sequenced in phase I of the 1000 genomes project as well as the chimpanzee differ from the reference genome, i.e. share the ancestral state (red). With the parameters chosen for mapping, the number of mismatches between a DNA fragment and the reference is limited to 3 for fragments 41bp, to 4 for fragments >=42bp, and to 5 for fragments >=64bp, etc., i.e. less than 10% of the bases are allowed differ from the reference genome in any given alignment. Given that differences to the reference genome can be due to three different bases, less than 3.3% of spuriously aligned microbial DNA fragments are expected to share the ancestral state at these positions. In contrast, nearly all (>99.9%) hominin sequences are expected to be ancestral at these positions. 4
5 Supplementary Information 2 Using conditional substitution frequencies for detecting and quantifying ancient DNA in mixtures with recent contamination Library-based sample preparation techniques in combination with high-throughput sequencing have revealed that authentic ancient DNA sequences show a distinct elevation of C to T substitutions close to their ends (or G to A substitutions depending on the method used for library preparation) 6. These substitutions, which arise due to cytosine deamination, accumulate predominantly at molecule ends and are much less frequent in contaminant DNA that was introduced during or after excavation of fossils 7. Elevated frequencies of damage-induced substitutions can thus be used to provide evidence for the presence of endogenous ancient DNA. Although there is no strict linear relationship between the strength of deamination signals and the age of specimens 8, nearly all samples that are thousands of years in age have been reported to show terminal C to T substitution frequencies exceeding 10% (see for example ref. 9 and 10). Some samples in which authentic ancient DNA may still be present, including some of the SH specimens described here, exhibit lower frequencies of C to T substitutions, not compatible with the presumed age of the sequences due to an overwhelming background of presentday contamination that dilutes the deamination signal. To increase our ability to detect ancient DNA in highly contaminated samples based on patterns of DNA damage, we have recently introduced the concept of conditional substitution frequencies in the analysis of mtdna sequences of SH femur XIII 11. Conditional substitution frequencies are obtained by isolating sequences that show a C to T difference to the reference at their 5 ends and determining the frequency of C to T substitutions at their 3 ends, and vice versa. In the case of femur XIII, C to T substitution frequencies increased from 12 and 17% at the 5 and 3 ends of all sequences to 55 and 62% in the population of sequences with a C to T substitution at the other end. The latter numbers are close to the deamination signals detected in a similar-age cave bear from the same site, suggesting that they might be a good proxy for the deamination frequencies of the fraction of endogenous DNA fragments in the specimen. However, it is also conceivable that decay processes leading to deamination at one end of a molecule may increase the likelihood of deamination at the other end, thereby reducing the power to detect and quantify mixtures of ancient and contaminant DNA based on conditional substitution frequencies. 5
6 To test whether a population of highly deaminated molecules may also be present among contaminant DNA fragments, we analyzed published mtdna sequences from two ancient hominin specimens, SH femur XIII 11 and a Late Pleistocene Neanderthal (Mezmaiskaya 1) 12, that show high levels of modern human contamination. Using diagnostic positions that differentiate the mtdna genomes of 311 present-day humans from the two archaic individuals (132 positions for femur XIII and 41 for Mezmaiskaya 1), we separated contaminant and endogenous sequences based on their sharing of either the human-derived or the ancestral state. To minimize erroneous assignments due to deamination we disregarded sequences that overlap a diagnostic site with their first or last 3 positions. We then determined the frequencies of C to T substitutions in both sets of sequences, both with and without conditioning on the presence of a C to T substitution on the other end (Supplementary Table 2). As expected for mixtures of ancient and contaminant sequences, conditional substitution frequencies increase compared to regular substitution frequencies prior to separation of the sequences. In the subset of contaminant sequences, conditional substitution frequencies increase slightly in femur XIII but remain well below 10%, providing little evidence for the presence of a highly deaminated population of DNA fragments among the contaminant DNA in the two specimens. To test whether deamination occurs independently on both ends of molecules we determined regular and conditional substitution frequencies using larger data sets of nuclear sequences from six archaic individuals 1,2,13 that have been estimated to contain negligible levels of modern human contamination (1% or less). Regular and conditional substitution frequencies are expected to be equal if deamination occurs independently at either end of the molecules. However, conditional substitution frequencies are between 10% lower and 14% higher than regular substitution frequencies, indicating that deamination frequencies at the ends are not entirely independent (Supplementary Table 3). A reduction in conditional substitution frequencies in same samples may be explained by mapping bias, i.e. the difficulty of mapping sequences with 2 terminal C to T substitutions and possibly other differences to the reference genome. Comparisons of regular and conditional substitution frequencies therefore do not allow for accurate estimation of modern human contamination, but they can be used to detect the presence of small amounts of ancient sequences in a background of human contamination, and provide rough estimates of contamination in heavily contaminated samples. When applied to the nuclear DNA sequences from the Sima de los Huesos specimens, we estimate that modern human DNA contamination is present in at least a 2.7-fold excess in all libraries, contributing 63% or more of all sequences (Supplementary Table 3). To minimize the impact of contamination on our analyses, we therefore restricted down-stream analyses to only those fragments showing evidence of deamination. 6
7 Supplementary Table 2: C to T substitution frequencies and conditional substitution frequencies in mtdna sequences from two archaic individuals. Diagnostic sites that differentiate the respective archaic individual from 311 present-day humans were used to identify subsets of sequences that are putatively of endogenous origin or from recent human contamination. The percentage of sequences identified as contamination from diagnostic sites is provided in the second column. Specimen SH femur XIII 714,601 sequences Human contamination based on diagnostic Population of sites [%] 94.9 All sequences 6.2 (9,406/151,703) C to T substitution frequency [%] (observations) Conditional sequences 5 end 3 end 5 end 3 end (10,099/112,213) (1081/2,335) Human contamination 1.6 (532/33,269) 2.3 (539/23,453) 3.8 (5/132) 54.7 (1,082/1,978) 5.2 (5/96) Endogenous 59.8 (1,354/2,264) 54.2 (1,004/1,852) 56.5 (152/269) 67.0 (152/227) Mezmaiskaya 1 (library B9687) 43.3% All sequences 8.1 (512/6,320) 13.7 (723/5,277) 15.5 (28/181) 25.9 (28/108) 28,955 sequences Human contamination 0.9 (5/572) 0.7 (4/550) 0.0 (0/1) 0.0 (0/1) Endogenous 13.5 (93/690) 18.3 (104/568) 15.6 (5/32) 16.7 (5/30) 7
8 Supplementary Table 3: Damage-induced substitution frequencies and conditional substitution frequencies in 6 archaic genomes and the SH libraries sequenced in the present study. Substitution frequencies at the 3 ends of sequences in the Vindija and Mezmaiskaya samples correspond to G to A substitutions, as the respective libraries were generated using a double-stranded method. Rough contamination estimates based on the comparison of the two substitution frequencies are provided in the final column. Specimen C to T (or G to A) substitution frequency [%] Conditional 5 end 3 end 5 end 3 end Conditional / regular substitution frequencies Contamination estimate based on deamination Altai Neanderthal (chromosome 21 only) Denisova manual phalanx (chromosome 21 only) Vindija (all sequences) Vindija (all sequences) Vindija (all sequences) Mezmaiskaya 1 (all sequences) SH Femur XIII (AT2944) SH Incisor (AT-5482) SH Femur frag. (AT-5431) SH Molar (AT-5444) SH Scapula (AT-6672)
9 Supplementary Information 3 Estimating present-day human contamination based on the sharing of alleles that have risen to high frequency in modern humans since the split from the archaic hominins The use of diagnostic sites, i.e. sites that show fixed differences between the hominin group from which the sequence comes and the potential contaminant group, is well established for ancient mtdna analysis. Here, we extend this strategy to the autosomes. We identified 28,552 diagnostic positions where all present-day human genomes sequenced in phase I of the 1000 genomes project differ from the high-coverage genome sequences of the Denisovan individual, the Altai Neanderthal and the chimpanzee, bonobo, gorilla, orangutan and rhesus macaque. The states of the outgroups were retrieved from the Ensembl Compara EPO 6 primate whole genome alignments 14,15. All sites were required to be located within unique regions of the human genome as defined by the map35_50% criteria described in Prüfer et al (Supplementary Section 5b) 2. Sequences from the SH specimens showing the derived, i.e. the human state at such sites are either due to present-day human contamination, variants segregating in the archaic populations that are not present in any of the four archaic chromosomes used in this analysis, or sequencing error. The percentage of sequences sharing the derived state can therefore be used to estimate an upper limit on the present-day human contamination present in the libraries. To test the suitability of this approach we first performed the analysis using low coverage sequence data from four Neanderthal individuals that were previously determined to contain less than 1% human contamination 2,13. To reduce errors due to cytosine-deamination, we masked out thymines in the first three positions and adenines in the last three to account for the occurrence of damage-derived C to T substitutions at the 5 end and G to A substitutions at the 3 end. Based on the percentage of sequences sharing the derived state our contamination estimate for these samples is between 6.3 and 8.6% (Supplementary Table 4). Because this approach measures not only contamination, but also unknown variation in the archaic hominins and sequencing error, these estimates are higher than the actual contamination estimates but allow us to exclude contamination above the 10% level. We thus performed the same analysis for the SH sequences, this time masking out only thymines within the first and last three sequence positions as single-stranded library preparation fully retains the strand information of the sequenced molecules and does not induce artificial G to A substitutions. Estimates of contamination are high in all SH libraries (78% or more), but drop to 20% for incisor AT-5482 and 0% for 9
10 both femur AT-5431 and molar AT-5444 after filtering for putatively deaminated sequences by requiring a C to T substitution at the 5 or 3 terminus. Since the confidence intervals of our estimates are extremely wide due to the limited data, we tried to further increase the power of the analysis by considering sites (totaling 128,004) where the derived allele has risen to 90% or more frequency in modern humans. While providing more power, this analysis may slightly under-estimate contamination as some contaminant sequences may share the archaic state. Overall, contamination estimates are similar to the first estimates but with narrower confidence intervals (Supplementary Table 4). 10
11 Supplementary Table 4: Nuclear contamination estimates based on sites that are derived in all, or more than 90%, of present-day humans and ancestral in two Denisovan and two Neanderthal chromosomes. 95% binomial confidence intervals are provided in brackets. Specimen Femur XIII Library A2021 Incisor AT-5482 Femur fragment AT-5431 Molar AT-5444 Scapula AT-6672 Filtered for sequences with terminal C to T substitution Derived allele frequency in present-day humans #sites #presentday human state Present-day human contamination [%] (95% C.I.) 81.1 ( ) ( ) 83.9 ( ) 20.0 ( ) 68.1 ( ) 0.0 ( ) 76.7 ( ) 0.0 ( ) 92.7 ( ) #sites #present-day human state ,811 1, N/A 1 1 Vindija , Vindija , Vindija , Mezmaiskaya 1-13,068 1, ( ) 6.0 ( ) 6.3 ( ) 8.6 ( ) 36,470 1,931 38,037 1,676 33,972 1,528 59,783 4,100 Present-day human contamination [%] (95% C.I.) 87.5 ( ) 62.5 ( ) 85.7 ( ) 20.8 ( ) 76.1 ( ) 18.2 ( ) 88.0 ( ) 0.0 ( ) 97.6 ( ) ( ) 5.3 ( ) 4.4 ( ) 4.5 ( ) 6.9 ( ) 11
12 References 1. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, (2012). 2. Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43-9 (2014). 3. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, (2009) Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, (2010). 5. The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, (2005). 6. Briggs, A.W. et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci U S A 104, (2007). 7. Krause, J. et al. A complete mtdna genome of an early modern human from Kostenki, Russia. Curr Biol 20, (2010). 8. Sawyer, S., Krause, J., Guschanski, K., Savolainen, V. & Paabo, S. Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS One 7, e34131 (2012). 9. Gamba, C. et al. Genome flux and stasis in a five millennium transect of European prehistory. Nat Commun 5, 5257 (2014). 10. Allentoft, M.E. et al. Population genomics of Bronze Age Eurasia. Nature 522, (2015). 11. Meyer, M. et al. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505, (2014). 12. Gansauge, M.T. & Meyer, M. Selective enrichment of damaged DNA molecules for ancient genome sequencing. Genome Res 24, (2014). 13. Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, (2010). 14. Paten, B., Herrero, J., Beal, K., Fitzgerald, S. & Birney, E. Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res 18, (2008). 15. Paten, B. et al. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res 18, (2008). 12
An early modern human from Romania with a recent Neanderthal ancestor
An early modern human from Romania with a recent Neanderthal ancestor The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation
More informationFossils From Vindija Cave, Croatia (38 44 kya) Admixture between Archaic and Modern Humans
Fossils From Vindija Cave, Croatia (38 44 kya) Admixture between Archaic and Modern Humans Alan R Rogers February 12, 2018 1 / 63 2 / 63 Hominin tooth from Denisova Cave, Altai Mtns, southern Siberia (41
More informationDetecting ancient admixture using DNA sequence data
Detecting ancient admixture using DNA sequence data October 10, 2008 Jeff Wall Institute for Human Genetics UCSF Background Origin of genus Homo 2 2.5 Mya Out of Africa (part I)?? 1.6 1.8 Mya Further spread
More informationInconsistencies in Neanderthal Genomic DNA Sequences
Inconsistencies in Neanderthal Genomic DNA Sequences Jeffrey D. Wall *, Sung K. Kim Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
More informationLETTER. An early modern human from Romania with a recent Neanderthal ancestor
doi:10.1038/nature14558 An early modern human from Romania with a recent Neanderthal ancestor Qiaomei Fu 1,2,3 *, Mateja ajdinjak 3 *, ana Teodora Moldovan 4, Silviu Constantin 5, Swapan Mallick 2,6,7,
More informationSupplementary information ATLAS
Supplementary information ATLAS Vivian Link, Athanasios Kousathanas, Krishna Veeramah, Christian Sell, Amelie Scheu and Daniel Wegmann Section 1: Complete list of functionalities Sequence data processing
More informationOverview One of the promises of studies of human genetic variation is to learn about human history and also to learn about natural selection.
Technical design document for a SNP array that is optimized for population genetics Yontao Lu, Nick Patterson, Yiping Zhan, Swapan Mallick and David Reich Overview One of the promises of studies of human
More informationA mitochondrial genome sequence of a hominin from Sima de los Huesos
LETTER doi:10.1038/nature12788 A mitochondrial genome sequence of a hominin from Sima de los Huesos Matthias Meyer 1, Qiaomei Fu 1,2, Ayinuer Aximu-Petri 1, Isabelle Glocke 1, Birgit Nickel 1, Juan-Luis
More informationSupporting Information
Supporting Information Eriksson and Manica 10.1073/pnas.1200567109 SI Text Analyses of Candidate Regions for Gene Flow from Neanderthals. The original publication of the draft Neanderthal genome (1) included
More informationFile S1 Technical details of a SNP array optimized for population genetics
File S1 Technical details of a SNP array optimized for population genetics Yontao Lu, Nick Patterson, Yiping Zhan, Swapan Mallick and David Reich Overview One of the promises of studies of human genetic
More informationLearning about human population history from ancient and modern genomes
APPLICATIONS OF NEXT-GENERATION SEQUENCING Learning about human population history from ancient and modern genomes Mark Stoneking* and Johannes Krause Abstract Genome-wide data, both from SNP arrays and
More informationSUPPLEMENTARY INFORMATION
doi:138/nature10532 a Human b Platypus Density 0.0 0.2 0.4 0.6 0.8 Ensembl protein coding Ensembl lincrna New exons (protein coding) Intergenic multi exonic loci Density 0.0 0.1 0.2 0.3 0.4 0.5 0 5 10
More informationQuantifying and reducing spurious alignments for the analysis of ultra-short ancient DNA sequences
de Filippo et al. BMC Biology (2018) 16:121 https://doi.org/10.1186/s12915-018-0581-9 RESEARCH ARTICLE Open Access Quantifying and reducing spurious alignments for the analysis of ultra-short ancient DNA
More informationGenome 373: Mapping Short Sequence Reads II. Doug Fowler
Genome 373: Mapping Short Sequence Reads II Doug Fowler The final Will be in this room on June 6 th at 8:30a Will be focused on the second half of the course, but will include material from the first half
More informationThe Red Queen Model of Recombination Hotspots Evolution in the Light of Archaic and Modern Human Genomes
The Red Queen Model of Recombination Hotspots Evolution in the Light of Archaic and Modern Human Genomes Yann Lesecque 1, Sylvain Glémin 2, Nicolas Lartillot 1, Dominique Mouchiroud 1, Laurent Duret 1
More informationLetter. The genome of the offspring of a Neanderthal mother and a Denisovan father
Letter https://doi.org/10.1038/s41586-018-0455-x The genome of the offspring of a Neanderthal mother and a Denisovan father Viviane Slon 1,7 *, Fabrizio Mafessoni 1,7, Benjamin Vernot 1,7, Cesare de Filippo
More informationSupplementary Figures
Supplementary Figures 1 Supplementary Figure 1. Analyses of present-day population differentiation. (A, B) Enrichment of strongly differentiated genic alleles for all present-day population comparisons
More informationAddressing Challenges of Ancient DNA Sequence Data Obtained with Next Generation Methods
DISSERTATION Addressing Challenges of Ancient DNA Sequence Data Obtained with Next Generation Methods submitted in fulfillment of the requirements for the degree Doctorate of natural science doctor rerum
More informationSupplementary Information for The ratio of human X chromosome to autosome diversity
Supplementary Information for The ratio of human X chromosome to autosome diversity is positively correlated with genetic distance from genes Michael F. Hammer, August E. Woerner, Fernando L. Mendez, Joseph
More informationSupplementary Figures
Supplementary Figures A B Supplementary Figure 1. Examples of discrepancies in predicted and validated breakpoint coordinates. A) Most frequently, predicted breakpoints were shifted relative to those derived
More informationVariant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4
WHITE PAPER Oncomine Comprehensive Assay Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 Contents Scope and purpose of document...2 Content...2 How Torrent
More informationCourse summary. Today. PCR Polymerase chain reaction. Obtaining molecular data. Sequencing. DNA sequencing. Genome Projects.
Goals Organization Labs Project Reading Course summary DNA sequencing. Genome Projects. Today New DNA sequencing technologies. Obtaining molecular data PCR Typically used in empirical molecular evolution
More informationSupplemental Figures Supplemental Figure 1.
Supplemental Material: Annu. Rev. Genom. Hum. Genet. 2017. 18:321-356 https://doi.org/10.1146/annurev-genom-091416-035526 A Robust Framework for Microbial Archaeology Warinner et al. Supplemental Figures
More informationAfter cell death, DNA steadily decays. If microbes do
How reliable are genomes from ancient DNA? Brian Thomas and Jeffrey Tomkins Many reports of ancient DNA (adna) assert recovery from specimens with age assignments that greatly exceed Scripture s age of
More informationSupplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line
Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line Table of Contents SUPPLEMENTARY TEXT:... 2 FILTERING OF RAW READS PRIOR TO ASSEMBLY:... 2 COMPARATIVE ANALYSIS... 2 IMMUNOGENIC
More informationSupplementary Material online Population genomics in Bacteria: A case study of Staphylococcus aureus
Supplementary Material online Population genomics in acteria: case study of Staphylococcus aureus Shohei Takuno, Tomoyuki Kado, Ryuichi P. Sugino, Luay Nakhleh & Hideki Innan Contents Estimating recombination
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature26136 We reexamined the available whole data from different cave and surface populations (McGaugh et al, unpublished) to investigate whether insra exhibited any indication that it has
More informationSupplementary Figure 1 Schematic view of phasing approach. A sequence-based schematic view of the serial compartmentalization approach.
Supplementary Figure 1 Schematic view of phasing approach. A sequence-based schematic view of the serial compartmentalization approach. First, barcoded primer sequences are attached to the bead surface
More informationChang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang
Supplementary Materials for: Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John
More informationREPORT. Complex History of Admixture between Modern Humans and Neandertals. Benjamin Vernot 1, * and Joshua M. Akey 1, *
REPORT Complex History of Admixture between Modern Humans and Neandertals Benjamin Vernot 1, * and Joshua M. Akey 1, * Recent analyses have found that a substantial amount of the Neandertal genome persists
More informationC3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère
C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/
More informationESTIMATING GENETIC VARIABILITY WITH RESTRICTION ENDONUCLEASES RICHARD R. HUDSON1
ESTIMATING GENETIC VARIABILITY WITH RESTRICTION ENDONUCLEASES RICHARD R. HUDSON1 Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 Manuscript received September 8, 1981
More informationCS262 Computation Genomics Winter 2015 Lecture 15 Human Population Genomics (02/24/2015) Scribed by: Junjie (Jason) Zhu Image Source: Lecture Notes
CS262 Computation Genomics Winter 2015 Lecture 15 Human Population Genomics (02/24/2015) Scribed by: Junjie (Jason) Zhu Image Source: Lecture Notes Introduction As the cost of sequencing individuals is
More informationHISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY
Third Pavia International Summer School for Indo-European Linguistics, 7-12 September 2015 HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY Brigitte Pakendorf, Dynamique du Langage, CNRS & Université
More informationGenome-Wide Survey of MicroRNA - Transcription Factor Feed-Forward Regulatory Circuits in Human. Supporting Information
Genome-Wide Survey of MicroRNA - Transcription Factor Feed-Forward Regulatory Circuits in Human Angela Re #, Davide Corá #, Daniela Taverna and Michele Caselle # equal contribution * corresponding author,
More informationNature Biotechnology: doi: /nbt Supplementary Figure 1. Read Complexity
Supplementary Figure 1 Read Complexity A) Density plot showing the percentage of read length masked by the dust program, which identifies low-complexity sequence (simple repeats). Scrappie outputs a significantly
More informationWhole Human Genome Sequencing Report This is a technical summary report for PG DNA
Whole Human Genome Sequencing Report This is a technical summary report for PG0002601-DNA Physician and Patient Information Physician name: Vinodh Naraynan Address: Suite 406 222 West Thomas Road Phoenix
More informationLecture 12. Genomics. Mapping. Definition Species sequencing ESTs. Why? Types of mapping Markers p & Types
Lecture 12 Reading Lecture 12: p. 335-338, 346-353 Lecture 13: p. 358-371 Genomics Definition Species sequencing ESTs Mapping Why? Types of mapping Markers p.335-338 & 346-353 Types 222 omics Interpreting
More informationUnderstanding Accuracy in SMRT Sequencing
Understanding Accuracy in SMRT Sequencing Jonas Korlach, Chief Scientific Officer, Pacific Biosciences Introduction Single Molecule, Real-Time (SMRT ) DNA sequencing achieves highly accurate sequencing
More informationNature Genetics: doi: /ng.3254
Supplementary Figure 1 Comparing the inferred histories of the stairway plot and the PSMC method using simulated samples based on five models. (a) PSMC sim-1 model. (b) PSMC sim-2 model. (c) PSMC sim-3
More informationCourse Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses
Course Information Introduction to Algorithms in Computational Biology Lecture 1 Meetings: Lecture, by Dan Geiger: Mondays 16:30 18:30, Taub 4. Tutorial, by Ydo Wexler: Tuesdays 10:30 11:30, Taub 2. Grade:
More informationGenetics 101. Prepared by: James J. Messina, Ph.D., CCMHC, NCC, DCMHS Assistant Professor, Troy University, Tampa Bay Site
Genetics 101 Prepared by: James J. Messina, Ph.D., CCMHC, NCC, DCMHS Assistant Professor, Troy University, Tampa Bay Site Before we get started! Genetics 101 Additional Resources http://www.genetichealth.com/
More information2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next.
1. True or False? A typical chromosome can contain several hundred to several thousand genes, arranged in linear order along the DNA molecule present in the chromosome. 2. True or False? The sequence of
More informationMate-pair library data improves genome assembly
De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate
More informationPopulation Genetics, Systematics and Conservation of Endangered Species
Population Genetics, Systematics and Conservation of Endangered Species Discuss Population Genetics and Systematics Describe how DNA is used in species management Wild vs. Captive populations Data Generation:
More informationSingle Nucleotide Variant Analysis. H3ABioNet May 14, 2014
Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide
More informationBST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data
BST227 Introduction to Statistical Genetics Lecture 8: Variant calling from high-throughput sequencing data 1 PC recap typical genome Differs from the reference genome at 4-5 million sites ~85% SNPs ~15%
More informationIntroduction to Algorithms in Computational Biology Lecture 1
Introduction to Algorithms in Computational Biology Lecture 1 Background Readings: The first three chapters (pages 1-31) in Genetics in Medicine, Nussbaum et al., 2001. This class has been edited from
More informationSeparating Population Structure from Recent Evolutionary History
Separating Population Structure from Recent Evolutionary History Problem: Spatial Patterns Inferred Earlier Represent An Equilibrium Between Recurrent Evolutionary Forces Such as Gene Flow and Drift. E.g.,
More informationRev. Cell Biol. Mol. Medicine Paleogenomics 243
Rev. Cell Biol. Mol. Medicine Paleogenomics 243 Paleogenomics Peter D. Heintzman *, André E. R. Soares *, Dan Chang, and Beth Shapiro Department of Ecology and Evolutionary Biology, University of California
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/314/5801/960/dc1 Supporting Online Material for The Transcriptome of the Sea Urchin Embryo Manoj P. Samanta, Waraporn Tongprasit, Sorin Istrail, R. Andrew Cameron, Qiang
More informationHuman SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1
Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1 Human single nucleotide polymorphisms The majority of human sequence variation is due to substitutions that have occurred once in the
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature12886 Contents SI 1: Sampling, Library Preparation and Sequencing............ 1 SI 2a: Processing and Mapping........................ 6 SI 2b: Altai Neandertal Mitochondrial Genome Sequence.........
More informationProceedings of the World Congress on Genetics Applied to Livestock Production,
Genomics using the Assembly of the Mink Genome B. Guldbrandtsen, Z. Cai, G. Sahana, T.M. Villumsen, T. Asp, B. Thomsen, M.S. Lund Dept. of Molecular Biology and Genetics, Research Center Foulum, Aarhus
More information1 (1) 4 (2) (3) (4) 10
1 (1) 4 (2) 2011 3 11 (3) (4) 10 (5) 24 (6) 2013 4 X-Center X-Event 2013 John Casti 5 2 (1) (2) 25 26 27 3 Legaspi Robert Sebastian Patricia Longstaff Günter Mueller Nicolas Schwind Maxime Clement Nararatwong
More informationDeleterious mutations
Deleterious mutations Mutation is the basic evolutionary factor which generates new versions of sequences. Some versions (e.g. those concerning genes, lets call them here alleles) can be advantageous,
More informationRemoval of deaminated cytosines and detection of in vivo methylation in ancient DNA
Published online 22 December 2009 Nucleic Acids Research, 2010, Vol. 38, No. 6 e87 doi:10.1093/nar/gkp1163 Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA Adrian W.
More informationRemoval of deaminated cytosines and detection of in vivo methylation in ancient DNA
Published online 22 December 2009 Nucleic Acids Research, 2010, Vol. 38, No. 6 e87 doi:10.1093/nar/gkp1163 Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA Adrian W.
More informationGENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,
More informationGenomic resources. for non-model systems
Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing
More informationLesson Overview DNA Replication
12.3 THINK ABOUT IT Before a cell divides, its DNA must first be copied. How might the double-helix structure of DNA make that possible? Copying the Code What role does DNA polymerase play in copying DNA?
More informationPetar Pajic 1 *, Yen Lung Lin 1 *, Duo Xu 1, Omer Gokcumen 1 Department of Biological Sciences, University at Buffalo, Buffalo, NY.
The psoriasis associated deletion of late cornified envelope genes LCE3B and LCE3C has been maintained under balancing selection since Human Denisovan divergence Petar Pajic 1 *, Yen Lung Lin 1 *, Duo
More informationSUPPLEMENTARY FIGURES AND TABLES. Exploration of mirna families for hypotheses generation
SUPPLEMENTARY FIGURES AND TABLES Exploration of mirna families for hypotheses generation Timothy K. K. Kamanu, Aleksandar Radovanovic, John A. C. Archer, and Vladimir B. Bajic King Abdullah University
More informationCS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes
CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes Coalescence Scribe: Alex Wells 2/18/16 Whenever you observe two sequences that are similar, there is actually a single individual
More informationObtaining DNA from degraded samples for NGS sequencing
Obtaining DNA from degraded samples for NGS sequencing A brief overview of Alexander (Sasha) Mikheyev s lecture at USC 03/13/14 Presented by Jacqueline Robinson 04/23/2014 NGS is great, but Standard protocols
More informationChIP-seq and RNA-seq. Farhat Habib
ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/328/5979/710/dc1 Supporting Online Material for A Draft Sequence of the Neandertal Genome Richard E. Green,* Johannes Krause, Adrian W. Briggs, Tomislav Maricic, Udo
More informationSupplementary Figure 1
Nucleotide Content E. coli End. Neb. Tr. End. Neb. Tr. Supplementary Figure 1 Fragmentation Site Profiles CRW1 End. Son. Tr. End. Son. Tr. Human PA1 Position Fragmentation site profiles. Nucleotide content
More informationMore often heard about on television dramas than on the news, DNA is the key to solving crimes the scientific way. Although it has only been
DNA Matching More often heard about on television dramas than on the news, DNA is the key to solving crimes the scientific way. Although it has only been relatively recent (compared the course of forensic
More informationThecompletegenomesequenceofa Neanderthal from the Altai Mountains
ARTICLE doi:10.1038/nature12886 Thecompletegenomesequenceofa Neanderthal from the Altai Mountains Kay Prüfer 1, Fernando Racimo 2, Nick Patterson 3, Flora Jay 2, Sriram Sankararaman 3,4, Susanna Sawyer
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 21 2011, pages 2957 2963 doi:10.1093/bioinformatics/btr507 Genome analysis Advance Access publication September 7, 2011 : fast length adjustment of short reads
More informationTitle: A high-coverage Neandertal genome from Vindija Cave in Croatia
Title: A high-coverage Neandertal genome from Vindija Cave in Croatia Authors: Kay Prüfer 1,*, Cesare de Filippo 1,, Steffi Grote 1,, Fabrizio Mafessoni 1,, Petra Korlević 1, Mateja Hajdinjak 1, Benjamin
More informationAncient DNA from pre-columbian South America
Ancient DNA from pre-columbian South America Guido Marcelo Valverde Garnica Australian Centre for Ancient DNA Department of Genetics and Evolution School of Biological Sciences The University of Adelaide
More informationChapter 17. PCR the polymerase chain reaction and its many uses. Prepared by Woojoo Choi
Chapter 17. PCR the polymerase chain reaction and its many uses Prepared by Woojoo Choi Polymerase chain reaction 1) Polymerase chain reaction (PCR): artificial amplification of a DNA sequence by repeated
More informationToward high-resolution population genomics using archaeological samples
DNA Research, 2016, 23(4), 295 310 doi: 10.1093/dnares/dsw029 Advance Access Publication Date: 19 July 2016 Invited Review Invited Review Toward high-resolution population genomics using archaeological
More informationOutline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies
Eric T. Weimer, PhD, D(ABMLI) Assistant Professor, Pathology & Laboratory Medicine, UNC School of Medicine Director, Molecular Immunology Associate Director, Clinical Flow Cytometry, HLA, and Immunology
More informationSUPPLEMENTARY INFORMATION
Contents De novo assembly... 2 Assembly statistics for all 150 individuals... 2 HHV6b integration... 2 Comparison of assemblers... 4 Variant calling and genotyping... 4 Protein truncating variants (PTV)...
More informationAPPLICATION NOTE
APPLICATION NOTE www.swiftbiosci.com Approaching Single-Cell Sequencing by Understanding NGS Library Complexity and Bias Abstract Demands are growing on genomics to deliver higher quality sequencing data
More informationWhat Are the Chemical Structures and Functions of Nucleic Acids?
THE NUCLEIC ACIDS What Are the Chemical Structures and Functions of Nucleic Acids? Nucleic acids are polymers specialized for the storage, transmission, and use of genetic information. DNA = deoxyribonucleic
More informationThe complete genome sequence of a Neandertal from the Altai Mountains
The complete genome sequence of a Neandertal from the Altai Mountains The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation
More informationParts of a standard FastQC report
FastQC FastQC, written by Simon Andrews of Babraham Bioinformatics, is a very popular tool used to provide an overview of basic quality control metrics for raw next generation sequencing data. There are
More informationWhole genome sequencing in the UK Biobank
Whole genome sequencing in the UK Biobank Part of the UK Government s Industrial Strategy Challenge Fund (ISCF) for the Data to Early Diagnosis and Precision Medicine initiative Aim to produce deep characterisation
More informationGenetic Identification of Ancient Korean Remains
Genetic Identification of Ancient Korean Remains Kyoung-Jin Shin, D.D.S., Ph.D. Department of Forensic Medicine Yonsei University College of Medicine, Seoul, Korea Merits and difficulties of ancient DNA
More informationSingle-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding
CORRECTION NOTICE Nat. Biotechnol. doi:10.1038/nbt.3880 Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding Freeman Lan, Benjamin Demaree, Noorsher Ahmed & Adam R
More information12/8/09 Comp 590/Comp Fall
12/8/09 Comp 590/Comp 790-90 Fall 2009 1 One of the first, and simplest models of population genealogies was introduced by Wright (1931) and Fisher (1930). Model emphasizes transmission of genes from one
More informationIncorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits
Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing
More informationANTH 6491: Anthropological Genetics Wednesdays pm, Purple Seminar Room, 6 th floor Science & Engineering Hall Fall 2015
ANTH 6491: Anthropological Genetics Wednesdays 6.10-8 pm, Purple Seminar Room, 6 th floor Science & Engineering Hall Fall 2015 Brief Summary: A detailed examination of molecular approaches to understanding
More informationComparison and Evaluation of Cotton SNPs Developed by Transcriptome, Genome Reduction on Restriction Site Conservation and RAD-based Sequencing
Comparison and Evaluation of Cotton SNPs Developed by Transcriptome, Genome Reduction on Restriction Site Conservation and RAD-based Sequencing Hamid Ashrafi Amanda M. Hulse, Kevin Hoegenauer, Fei Wang,
More informationSystematic evaluation of spliced alignment programs for RNA- seq data
Systematic evaluation of spliced alignment programs for RNA- seq data Pär G. Engström, Tamara Steijger, Botond Sipos, Gregory R. Grant, André Kahles, RGASP Consortium, Gunnar Rätsch, Nick Goldman, Tim
More informationQuestion 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.
Bio4342 Exercise 1 Answers: Detecting and Interpreting Genetic Homology (Answers prepared by Wilson Leung) Question 1: Low complexity DNA can be described as sequences that consist primarily of one or
More informationDNA METHYLATION RESEARCH TOOLS
SeqCap Epi Enrichment System Revolutionize your epigenomic research DNA METHYLATION RESEARCH TOOLS Methylated DNA The SeqCap Epi System is a set of target enrichment tools for DNA methylation assessment
More informationUHT Sequencing Course Large-scale genotyping. Christian Iseli January 2009
UHT Sequencing Course Large-scale genotyping Christian Iseli January 2009 Overview Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature10163 Supplementary Table 1 Efficiency of vector construction. Process wells recovered efficiency (%) Recombineering* 480 461 96 Intermediate plasmids 461 381 83 Recombineering efficiency
More informationIntroduction to Short Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016
Introduction to Short Read Alignment UCD Genome Center Bioinformatics Core Tuesday 14 June 2016 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationHuman Genetic Variation. Ricardo Lebrón Dpto. Genética UGR
Human Genetic Variation Ricardo Lebrón rlebron@ugr.es Dpto. Genética UGR What is Genetic Variation? Origins of Genetic Variation Genetic Variation is the difference in DNA sequences between individuals.
More informationSupplementary Methods 2. Supplementary Table 1: Bottleneck modeling estimates 5
Supplementary Information Accelerated genetic drift on chromosome X during the human dispersal out of Africa Keinan A, Mullikin JC, Patterson N, and Reich D Supplementary Methods 2 Supplementary Table
More informationSUPPLEMENTARY INFORMATION
SUPPLEMENTARY INFORMATION doi:10.1038/nature24473 1. Supplementary Information Computational identification of neoantigens Neoantigens from the three datasets were inferred using a consistent pipeline
More informationUnit 2: Biological basis of life, heredity, and genetics
Unit 2: Biological basis of life, heredity, and genetics 1 Issues with Darwin's Evolutionary Theory??? 2 Cells - General Composition Organelles - substructures in the cell which do different things involved
More informationENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics
A very coarse introduction to bioinformatics In this exercise, you will get a quick primer on how DNA is used to manufacture proteins. You will learn a little bit about how the building blocks of these
More information