Why can GBS be complicated? Tools for filtering, error correction and imputation.
|
|
- Candice Emily Harrington
- 6 years ago
- Views:
Transcription
1 Why can GBS be complicated? Tools for filtering, error correction and imputation. Edward Buckler USDA-ARS Cornell University
2 Many Organisms Are Diverse Humans are at the lower end of diversity, which results in most genomic and bioinformatics being optimized to this situation What is diverse: Grapes, Flies, Arabidopsis, and most dominant species on the planet
3 Maize has more molecular diversity than humans and apes combined 1.34% 0.09% 1.42% Silent Diversity (Zhao PNAS 2000; Tenallion et al, PNAS 2001)
4 Maize genetic variation has been evolving for 5 million years Warm Pliocene 5mya 4mya Modern Variation Begins Evolving Sister Genus Diverges Divergence from Chimps Ardipithecus 3mya Australopithecus Cold Pleistocene 2mya 1mya Zea species begin diverging Maize domesticated Homo erectus Modern Variation Begins Modern Humans
5 Features of TASSEL GBS Bioinformatic Pipeline Designed for large numbers of samples Modest computing requirements Species or Genus level SNP calling & filtering Filtering to deal with extensive paralogy Glaubitz et al 2014 PLOSone. Available in TASSEL at maizegenetics.net
6 Reference based GBS bioinformatics pipeline Discovery Production FASTQ FASTQ Tags by Taxa Tag Counts Tags on Physical Map Tags on Physical Map SNP Caller Genotypes Filtered Genotypes Genotypes
7 What are our expectations with GBS?
8 High Diversity Ensures High Return on Sequencing Proportion of informative markers Highly repetitive 15% not easily informative Half the genome is not shared between two maize line Potentially all of these are informative with a large enough database Low copy shared proportion (1% diversity) Bi-parental information = (1-0.01)^64bp = 48% informative Association information = (1-0.05)^64bp= 97% informative
9 Expectation of marker distribution Biallelic, 17% Presense /Absense, 50% Too Repetitiv e, 15% Presense /Absense, 50% Multialleli c, 34% Nonpolymor phic; 18% Too Repetitiv e, 15% Biparental population Nonpolymorp hic; 1% Across the species
10 Sequencing Error
11 Illumina Basic Error Rate is ~1% Error rates are associated with distance from start of sequence Bad GBS puts these all at the same position Good Reverse reads can correct Good Error are consistent and modelable
12 Reads with errors Perfect sequences: =52.5% of the 64bp sequences are perfect 47.5 are NOT perfect The errors are autocorrelated so the proportion of perfect sequence is a little higher, and those with 2 or more is also higher.
13 Do we see these errors? Assume 10,000 lines genotyped at 0.5X coverage Base Type Read # (no SNP) Read # (w/ SNP) A Major C Minor (50 real) G Error T Error 17 17
14 Do Errors Matter? Yes Imputation, Haplotype reconstruction Maybe GWAS for low frequency SNPs No GS, genetic distance, mapping on biparental populations
15 Expectations of Real SNPs Vast majority are biallelic Homozygosity is predicted by inbreeding coefficient Allele frequency is constrained in structured populations In linkage disequilibrium with neighboring SNPs
16 Studying Errors in Biparental populations Limited range of alleles, expected allele frequencies, high LD
17 Maize RIL population expectations Allele frequency 0% or 50% Nearby sites should be in very high LD (r 2 >50%) Most sites can be tested if multiple populations are available
18 Bi-parental populations allow identification of error, and non-mendelian segregation Non-segregating Error Segregating
19 Bi-parental populations allow identification of error, and non-mendelian segregation Error
20 Median error rate is 0, but there is a long tail of some high error sites Median
21 GBS error rates vs. Maize 50K SNP Chip 7,254 SNPs in common 279 maize inbreds in common ( Maize282 panel) Comparison to 50K SNPs Filtered GBS genotypes Mean Error Rate (per SNP) Median Error Rate (per SNP) All genotypic comparisons: 1.18% 0.93% Homozygotes only: 0.58% 0.42% Internal GBS recall error rate with 1X coverage is ~0.2%, half of 50K chip errors are paralogy issues between the systems
22 MergeDuplicateSNPsPlugin When restriction sites are less than 128bp apart, we may read SNP from both directions (strands) ~13% of all sites Fusing increases coverage Fixes errors -mismat = set maximum mismatch rate -callhets = mismatch set to hets or not
23 Use the biology of your system to filter SNP Hardy-Weinberg Disequilibrium For inbreeding crops use the expected inbreeding coefficient to filter SNPs How to deal with the low coverage Although many individuals only have 1X coverage (27% of the samples will have 2X or more). Use these individuals to evaluate the quality of segregation.
24 Product of Filtering After filters, in maize we find error rate AA<>aa = < AA<>Aa = 0.8 at low coverage SNPs in wrong location <~1%. Lower in other species.
25 Only a limited proportion of the best alignments are shared between aligners Bowtie2 31.2% 17.4% 51.5% BWA BWA 9.5% 12.8% Blast 12.4% 1.4% 1.2% 0.8% 10.9% 2.5% 0.6% 0.8% Bowtie2 16.9% 13.2% 45.4% 4.7% 3.0% 14.4% Bowtie2 BWA 1.3% 32.9% 1.6% 3.3% 3.9% 6.5% BWA-MEM Which alignment is real BWA-MEM
26 Genetic Mapping of All Unique Tags Test for association between between taxa with tags and an anchor map Apply in both an association and linkage context Fei Lu Trillions of tests, so computational speed is key Fei Lu in review
27 What do YOU want to filter on? MAF Mapping accuracy Support from Paired-End Support from multiple aligners Heterozygosity Coverage Agreement with WGS Allele frequency within certain populations LD with other SNPs Stats within a particular population
28 Finding the Good SNPs Discovery TOPM SNP & Features Train Filter TOPM Production TOPM
29 Unstable Genomes
30 Using the Presence/Absence Variants In species like maize, this is the majority of the data Less subject to sequencing error Need imputation methods to differentiate between missing from sampling and biologically missing
31 Only 50% of the maize genome is shared between two varieties Plant 1 Person 1 50% 99% Plant 2 Plant 3 Person 2 Person 3 Maize Humans Fu & Dooner 2002, Morgante et al. 2005, Brunner et al 2005 Numerous PAVs and CNVs - Springer, Lai, Schnable in 2010
32 Developed by Fei Lu Identify structural variations GWAS and Joint linkage mapping GWAS Tags Joint Linkage If Map Y If Align N PAVs Insertion Read depth (B73 vs Non-B73) PAV (Deletion) B73 CNV (Duplication) Non B73 Chromosome Bin 1 Bin 2 Bin 3
33 Model training for mapping accuracy Dependent variable, distance (abs(genetic position physical position)) Attributes: P-value, likelihood ratio, tag count, etc. Machine learning models: decision tree, SVM, M5Rules, etc. 26M tags GWAS 8.6 M 6.4 M G GJ Y Y 4.5 M mapped tags Joint linkage 0.5 M J Y Model training and prediction using M5Rules
34 High-resolution mapped tags 95% 50% 4.5 million mapped tags in all maize lines 100 bp 10 Kb 1 Mb 101 Kb Gb 10 Gb Median resolution Framework of maize pan genome
35 Paired-End Sequencing Most of the tags are less than 500bp in size. Full contigs can be developed by sequencing them on a MiSeq with paired-end 250bp reads Strategies physically bridge from the mapped end, genetically anchor one end.
36 Phylogenetic (Cross Species) Analysis When divergence <10% between taxa, GBS makes sense as 40% of the cut site pairs are preserved. If focused a genic regions, can go higher. Currently pipeline tools not great for very high levels of divergence. Standard nextgen or Paired-End approaches might be a good strategy.
37 Future Clean output and input into machine learning algorithms Developing a community to explore and develop filters
38 Become a power user: Play with the code to add your features Documentation at maizegenetics.net Source code at SourceForge Guide to programming TASSEL: Google Group ssel
Why can GBS be complicated? Tools for filtering & error correction. Edward Buckler USDA-ARS Cornell University
Why can GBS be complicated? Tools for filtering & error correction Edward Buckler USDA-ARS Cornell University http://www.maizegenetics.net Maize has more molecular diversity than humans and apes combined
More informationApplying Genotyping by Sequencing (GBS) to Corn Genetics and Breeding. Peter Bradbury USDA/Cornell University
Applying Genotyping by Sequencing (GBS) to Corn Genetics and Breeding Peter Bradbury USDA/Cornell University Genotyping by sequencing (GBS) makes use of high through-put, short-read sequencing to provide
More informationGBS Usage Cases: Non-model Organisms. Katie E. Hyma, PhD Bioinformatics Core Institute for Genomic Diversity Cornell University
GBS Usage Cases: Non-model Organisms Katie E. Hyma, PhD Bioinformatics Core Institute for Genomic Diversity Cornell University Q: How many SNPs will I get? A: 42. What question do you really want to ask?
More informationGBS Usage Cases: Examples from Maize
GBS Usage Cases: Examples from Maize Jeff Glaubitz (jcg233@cornell.edu) Senior Research Associate, Buckler Lab, Cornell University Panzea Project Manager Cornell IGD/CBSU Workshop June 17 18, 2013 Some
More informationSupplementary Figure 1 Genotyping by Sequencing (GBS) pipeline used in this study to genotype maize inbred lines. The 14,129 maize inbred lines were
Supplementary Figure 1 Genotyping by Sequencing (GBS) pipeline used in this study to genotype maize inbred lines. The 14,129 maize inbred lines were processed following GBS experimental design 1 and bioinformatics
More informationUsage Cases of GBS. Jeff Glaubitz Senior Research Associate, Buckler Lab, Cornell University Panzea Project Manager
Usage Cases of GBS Jeff Glaubitz (jcg233@cornell.edu) Senior Research Associate, Buckler Lab, Cornell University Panzea Project Manager Cornell CBSU Workshop Sept 15 16, 2011 Some potential applications
More informationCrop design with genomics and natural diversity. Edward Buckler USDA-ARS Cornell University
Crop design with genomics and natural diversity Edward Buckler USDA-ARS Cornell University http://www.maizegenetics.net 4 years Goal: Create the global model to decrease cycle time Make Crosses Make Crosses
More informationSNP calling and VCF format
SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide
More informationWhole Genome Sequencing. Biostatistics 666
Whole Genome Sequencing Biostatistics 666 Genomewide Association Studies Survey 500,000 SNPs in a large sample An effective way to skim the genome and find common variants associated with a trait of interest
More informationSUPPLEMENTARY INFORMATION
Contents De novo assembly... 2 Assembly statistics for all 150 individuals... 2 HHV6b integration... 2 Comparison of assemblers... 4 Variant calling and genotyping... 4 Protein truncating variants (PTV)...
More informationSNP calling and Genome Wide Association Study (GWAS) Trushar Shah
SNP calling and Genome Wide Association Study (GWAS) Trushar Shah Types of Genetic Variation Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide Variations (SNVs) Short
More informationThe Diploid Genome Sequence of an Individual Human
The Diploid Genome Sequence of an Individual Human Maido Remm Journal Club 12.02.2008 Outline Background (history, assembling strategies) Who was sequenced in previous projects Genome variations in J.
More informationMapping and Mapping Populations
Mapping and Mapping Populations Types of mapping populations F 2 o Two F 1 individuals are intermated Backcross o Cross of a recurrent parent to a F 1 Recombinant Inbred Lines (RILs; F 2 -derived lines)
More informationC3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère
C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/
More informationGenome-Wide Association Studies (GWAS): Computational Them
Genome-Wide Association Studies (GWAS): Computational Themes and Caveats October 14, 2014 Many issues in Genomewide Association Studies We show that even for the simplest analysis, there is little consensus
More informationCS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes
CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes Coalescence Scribe: Alex Wells 2/18/16 Whenever you observe two sequences that are similar, there is actually a single individual
More informationH3A - Genome-Wide Association testing SOP
H3A - Genome-Wide Association testing SOP Introduction File format Strand errors Sample quality control Marker quality control Batch effects Population stratification Association testing Replication Meta
More informationGenomic resources. for non-model systems
Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing
More informationS G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics
S G ection ON tatistical enetics Design and Analysis of Genetic Association Studies Hemant K Tiwari, Ph.D. Professor & Head Section on Statistical Genetics Department of Biostatistics School of Public
More informationConstruction of the third-generation Zea mays haplotype map
GigaScience, 7, 2018, 1 12 doi: 10.1093/gigascience/gix134 Advance Access Publication Date: 30 December 2018 Research RESEARCH Construction of the third-generation Zea mays haplotype map Robert Bukowski
More informationTHE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE
: GENETIC DATA UPDATE April 30, 2014 Biomarker Network Meeting PAA Jessica Faul, Ph.D., M.P.H. Health and Retirement Study Survey Research Center Institute for Social Research University of Michigan HRS
More informationLecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012
Lecture 23: Causes and Consequences of Linkage Disequilibrium November 16, 2012 Last Time Signatures of selection based on synonymous and nonsynonymous substitutions Multiple loci and independent segregation
More informationGenome Assembly Using de Bruijn Graphs. Biostatistics 666
Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position
More informationDNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing
TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing Plant and animal whole genome re-sequencing (WGRS) involves sequencing the entire genome of a plant or animal and comparing the sequence
More informationSNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es
SNP calling Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Genotype matrix Genotype matrix: Samples x SNPs SNPs and errors A change in a read may due to: Sample contamination Cloning or PCR
More informationGenome-wide association studies (GWAS) Part 1
Genome-wide association studies (GWAS) Part 1 Matti Pirinen FIMM, University of Helsinki 03.12.2013, Kumpula Campus FIMM - Institiute for Molecular Medicine Finland www.fimm.fi Published Genome-Wide Associations
More informationVariation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI
Variation detection based on second generation sequencing data Xin LIU Department of Science and Technology, BGI liuxin@genomics.org.cn 2013.11.21 Outline Summary of sequencing techniques Data quality
More informationProstate Cancer Genetics: Today and tomorrow
Prostate Cancer Genetics: Today and tomorrow Henrik Grönberg Professor Cancer Epidemiology, Deputy Chair Department of Medical Epidemiology and Biostatistics ( MEB) Karolinska Institutet, Stockholm IMPACT-Atanta
More informationAssemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz
Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Table of Contents Supplementary Note 1: Unique Anchor Filtering Supplementary Figure
More informationNext Generation Genetics: Using deep sequencing to connect phenotype to genotype
Next Generation Genetics: Using deep sequencing to connect phenotype to genotype http://1001genomes.org Korbinian Schneeberger Connecting Genotype and Phenotype Genotyping SNPs small Resequencing SVs*
More informationGENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,
More informationOral Cleft Targeted Sequencing Project
Oral Cleft Targeted Sequencing Project Oral Cleft Group January, 2013 Contents I Quality Control 3 1 Summary of Multi-Family vcf File, Jan. 11, 2013 3 2 Analysis Group Quality Control (Proposed Protocol)
More informationImproving genome assemblies and capturing genome variation data for applied crop improvement.
Improving genome assemblies and capturing genome variation data for applied crop improvement. David Edwards University of Western Australia, Australia Dave.Edwards@uwa.edu.au 1 Outline Sequencing wheat
More informationBulked Segregant Analysis For Fine Mapping Of Genes. Cheng Zou, Qi Sun Bioinformatics Facility Cornell University
Bulked Segregant Analysis For Fine Mapping Of enes heng Zou, Qi Sun Bioinformatics Facility ornell University Outline What is BSA? Keys for a successful BSA study Pipeline of BSA extended reading ompare
More informationIntroduc)on to GBS. Hueber Yann, Alexis Dereeper, Gau)er Sarah, François Sabot, Vincent Ranwez, Jean- François Dufayard 02/11/2015
Introduc)on to GBS Hueber Yann, Alexis Dereeper, Gau)er Sarah, François Sabot, Vincent Ranwez, Jean- François Dufayard 02/11/2015 Index Defini)on Methodologies The RADseq (single, paired- end) example
More informationComputational Workflows for Genome-Wide Association Study: I
Computational Workflows for Genome-Wide Association Study: I Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 16, 2014 Outline 1 Outline 2 3 Monogenic Mendelian Diseases
More informationGenomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics
Genomic Technologies Michael Schatz Feb 1, 2018 Lecture 2: Applied Comparative Genomics Welcome! The primary goal of the course is for students to be grounded in theory and leave the course empowered to
More informationAnalysis of genome-wide genotype data
Analysis of genome-wide genotype data Acknowledgement: Several slides based on a lecture course given by Jonathan Marchini & Chris Spencer, Cape Town 2007 Introduction & definitions - Allele: A version
More informationIdentifying the functional bases of trait variation in Brassica napus using Associative Transcriptomics
Brassica genome structure and evolution Genome framework for association genetics Establishing marker-trait associations 31 st March 2014 GENOME RELATIONSHIPS BETWEEN SPECIES U s TRIANGLE 31 st March 2014
More informationCS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016
CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 Topics Genetic variation Population structure Linkage disequilibrium Natural disease variants Genome Wide Association Studies Gene
More informationDomestic animal genomes meet animal breeding translational biology in the ag-biotech sector. Jerry Taylor University of Missouri-Columbia
Domestic animal genomes meet animal breeding translational biology in the ag-biotech sector Jerry Taylor University of Missouri-Columbia Worldwide distribution and use of cattle A brief history of cattle
More informationIllumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era
Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era Anthony Green Sr. Genotyping Sales Specialist North America 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx,
More informationHuman Genetic Variation. Ricardo Lebrón Dpto. Genética UGR
Human Genetic Variation Ricardo Lebrón rlebron@ugr.es Dpto. Genética UGR What is Genetic Variation? Origins of Genetic Variation Genetic Variation is the difference in DNA sequences between individuals.
More informationTranscriptome Assembly, Functional Annotation (and a few other related thoughts)
Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 23, 2017 Differential Gene Expression Generalized Workflow File Types
More informationHuman Genetics and Gene Mapping of Complex Traits
Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2015 Human Genetics Series Thursday 4/02/15 Nancy L. Saccone, nlims@genetics.wustl.edu ancestral chromosome present day chromosomes:
More informationUnderstanding genetic association studies. Peter Kamerman
Understanding genetic association studies Peter Kamerman Outline CONCEPTS UNDERLYING GENETIC ASSOCIATION STUDIES Genetic concepts: - Underlying principals - Genetic variants - Linkage disequilibrium -
More informationTargeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition
RESEARCH ARTICLE Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition Moses M. Muraya 1,2, Thomas Schmutzer 1 *, Chris Ulpinnis
More informationEstimating the rates and modes of creation of new genetic variation in plants using NGS technologies
Estimating the rates and modes of creation of new genetic variation in plants using NGS technologies 14/06/2016 Supervisor: Prof. Michele Morgante Co-supervisor: Fabio Marroni PhD Student: Ettore Zapparoli
More informationGenotypic data densification for genomic selection in the black poplar. Thesis director : Léopoldo SANCHEZ, Supervisor : Véronique JORGE
Genotypic data densification for genomic selection in the black poplar Thesis director : Léopoldo SANCHEZ, Supervisor : Véronique JORGE Three main questions Thesis project focus on best way to implement
More informationPersonal Genomics Platform White Paper Last Updated November 15, Executive Summary
Executive Summary Helix is a personal genomics platform company with a simple but powerful mission: to empower every person to improve their life through DNA. Our platform includes saliva sample collection,
More informationComparison and Evaluation of Cotton SNPs Developed by Transcriptome, Genome Reduction on Restriction Site Conservation and RAD-based Sequencing
Comparison and Evaluation of Cotton SNPs Developed by Transcriptome, Genome Reduction on Restriction Site Conservation and RAD-based Sequencing Hamid Ashrafi Amanda M. Hulse, Kevin Hoegenauer, Fei Wang,
More informationHarnessing the power of RADseq for ecological and evolutionary genomics
STUDY DESIGNS Harnessing the power of RADseq for ecological and evolutionary genomics Kimberly R. Andrews 1, Jeffrey M. Good 2, Michael R. Miller 3, Gordon Luikart 4 and Paul A. Hohenlohe 5 Abstract High-throughput
More informationThe Basics of Understanding Whole Genome Next Generation Sequence Data
The Basics of Understanding Whole Genome Next Generation Sequence Data Heather Carleton-Romer, MPH, Ph.D. ASM-CDC Infectious Disease and Public Health Microbiology Postdoctoral Fellow PulseNet USA Next
More informationPOLYMORPHISM AND VARIANT ANALYSIS. Matt Hudson Crop Sciences NCSA HPCBio IGB University of Illinois
POLYMORPHISM AND VARIANT ANALYSIS Matt Hudson Crop Sciences NCSA HPCBio IGB University of Illinois Outline How do we predict molecular or genetic functions using variants?! Predicting when a coding SNP
More informationDE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN
DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN ... 2014 2015 2016 2017 ... 2014 2015 2016 2017 Synthetic
More informationRoadmap: genotyping studies in the post-1kgp era. Alex Helm Product Manager Genotyping Applications
Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era Alex Helm Product Manager Genotyping Applications 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa,
More informationUHT Sequencing Course Large-scale genotyping. Christian Iseli January 2009
UHT Sequencing Course Large-scale genotyping Christian Iseli January 2009 Overview Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments
More informationLecture 10 : Whole genome sequencing and analysis. Introduction to Computational Biology Teresa Przytycka, PhD
Lecture 10 : Whole genome sequencing and analysis Introduction to Computational Biology Teresa Przytycka, PhD Sequencing DNA Goal obtain the string of bases that make a given DNA strand. Problem Typically
More informationTranscriptomics analysis with RNA seq: an overview Frederik Coppens
Transcriptomics analysis with RNA seq: an overview Frederik Coppens Platforms Applications Analysis Quantification RNA content Platforms Platforms Short (few hundred bases) Long reads (multiple kilobases)
More informationStructure, Measurement & Analysis of Genetic Variation
Structure, Measurement & Analysis of Genetic Variation Sven Cichon, PhD Professor of Medical Genetics, Director, Division of Medcial Genetics, University of Basel Institute of Neuroscience and Medicine
More informationVariant Callers. J Fass 24 August 2017
Variant Callers J Fass 24 August 2017 Variant Types Caller Consistency Pabinger (2014) Briefings Bioinformatics 15:256 Freebayes Bayesian haplotype caller that can call SNPs, short CNVs / duplications,
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature26136 We reexamined the available whole data from different cave and surface populations (McGaugh et al, unpublished) to investigate whether insra exhibited any indication that it has
More informationlatestdevelopments relevant for the Ag sector André Eggen Agriculture Segment Manager, Europe
Overviewof Illumina s latestdevelopments relevant for the Ag sector André Eggen Agriculture Segment Manager, Europe Seminar der Studienrichtung Tierwissenschaften, TÜM, July 1, 2009 Overviewof Illumina
More informationAdd 2016 GBS Poster As Slide One
Add 2016 GBS Poster As Slide One GBS Adapters and Enzymes Barcode Adapter P1 Sticky Ends Common Adapter P2 Illumina Sequencing Primer 2 Barcode (4 8 bp) Restriction Enzymes Illumina Sequencing Primer 1
More informationDe novo whole genome assembly
De novo whole genome assembly Qi Sun Bioinformatics Facility Cornell University Sequencing platforms Short reads: o Illumina (150 bp, up to 300 bp) Long reads (>10kb): o PacBio SMRT; o Oxford Nanopore
More informationAccelerate High Throughput Analysis for Genome Sequencing with GPU
Accelerate High Throughput Analysis for Genome Sequencing with GPU ATIP - A*CRC Workshop on Accelerator Technologies in High Performance Computing May 7-10, 2012 Singapore BingQiang WANG, Head of Scalable
More informationAxiom mydesign Custom Array design guide for human genotyping applications
TECHNICAL NOTE Axiom mydesign Custom Genotyping Arrays Axiom mydesign Custom Array design guide for human genotyping applications Overview In the past, custom genotyping arrays were expensive, required
More informationApplication of Genotyping-By-Sequencing and Genome-Wide Association Analysis in Tetraploid Potato
Application of Genotyping-By-Sequencing and Genome-Wide Association Analysis in Tetraploid Potato Sanjeev K Sharma Cell and Molecular Sciences The 3 rd Plant Genomics Congress, London 12 th May 2015 Potato
More informationNext-Generation Sequencing. Technologies
Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062
More informationGenome 373: Mapping Short Sequence Reads II. Doug Fowler
Genome 373: Mapping Short Sequence Reads II Doug Fowler The final Will be in this room on June 6 th at 8:30a Will be focused on the second half of the course, but will include material from the first half
More informationMaize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content
Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content Nathan M. Springer 1., Kai Ying 2,3., Yan Fu 4,5, Tieming Ji 6, Cheng-Ting Yeh 4,7,
More informationProgress in genomics applications in investigating abiotic stresses influencing perennial forage and biomass grasses
Progress in genomics applications in investigating abiotic stresses influencing perennial forage and biomass grasses Dr. Susanne Barth Teagasc Crops Research Centre Oak Park Carlow Irland Global challenges
More informationHuman SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1
Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1 Human single nucleotide polymorphisms The majority of human sequence variation is due to substitutions that have occurred once in the
More informationUsing the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010!
Using the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010! buell@msu.edu! 1 Whole Genome Shotgun Sequencing 2 New Technologies Revolutionize
More informationPopulation Genetics Sequence Diversity Molecular Evolution. Physiology Quantitative Traits Human Diseases
Population Genetics Sequence Diversity Molecular Evolution Physiology Quantitative Traits Human Diseases Bioinformatics problems in medicine related to physiology and quantitative traits Note: Genetics
More informationAnchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)
Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ) Martin Mascher IPK Gatersleben PAG XXII January 14, 2012 slide 1 Proof-of-principle in barley Diploid model for wheat 5 Gb
More informationDNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros
DNA Collection Data Quality Control Suzanne M. Leal Baylor College of Medicine sleal@bcm.edu Copyrighted S.M. Leal 2016 Blood samples For unlimited supply of DNA Transformed cell lines Buccal Swabs Small
More informationBioinformatic Analysis of SNP Data for Genetic Association Studies EPI573
Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573 Mark J. Rieder Department of Genome Sciences mrieder@u.washington washington.edu Epidemiology Studies Cohort Outcome Model to fit/explain
More informationLD Mapping and the Coalescent
Zhaojun Zhang zzj@cs.unc.edu April 2, 2009 Outline 1 Linkage Mapping 2 Linkage Disequilibrium Mapping 3 A role for coalescent 4 Prove existance of LD on simulated data Qualitiative measure Quantitiave
More informationLinkage Disequilibrium
Linkage Disequilibrium Why do we care about linkage disequilibrium? Determines the extent to which association mapping can be used in a species o Long distance LD Mapping at the tens of kilobase level
More informationSequence Assembly and Alignment. Jim Noonan Department of Genetics
Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome
More informationTruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR)
tru TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR) Anton Bankevich Center for Algorithmic Biotechnology, SPbSU Sequencing costs 1. Sequencing costs do not follow Moore s law
More informationProceedings of the World Congress on Genetics Applied to Livestock Production,
Genomics using the Assembly of the Mink Genome B. Guldbrandtsen, Z. Cai, G. Sahana, T.M. Villumsen, T. Asp, B. Thomsen, M.S. Lund Dept. of Molecular Biology and Genetics, Research Center Foulum, Aarhus
More informationThe Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience
Building Excellence in Genomics and Computa5onal Bioscience Resequencing approaches Sarah Ayling Crop Genomics and Diversity sarah.ayling@tgac.ac.uk Why re- sequence plants? To iden
More informationSupplementary Methods Illumina Genome-Wide Genotyping Single SNP and Microsatellite Genotyping. Supplementary Table 4a Supplementary Table 4b
Supplementary Methods Illumina Genome-Wide Genotyping All Icelandic case- and control-samples were assayed with the Infinium HumanHap300 SNP chips (Illumina, SanDiego, CA, USA), containing 317,503 haplotype
More informationAnalysis of structural variation. Alistair Ward USTAR Center for Genetic Discovery University of Utah
Analysis of structural variation Alistair Ward USTAR Center for Genetic Discovery University of Utah What is structural variation? What differentiates SV from short variants? What are the major SV types?
More informationRADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé
RADSeq Data Analysis Through STACKS on Galaxy Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé RAD sequencing: next-generation tools for an old problem INTRODUCTION source: Karim Gharbi
More informationUsing the Association Workflow in Partek Genomics Suite
Using the Association Workflow in Partek Genomics Suite This user guide will illustrate the use of the Association workflow in Partek Genomics Suite (PGS) and discuss the basic functions available within
More informationDe novo whole genome assembly
De novo whole genome assembly Lecture 1 Qi Sun Bioinformatics Facility Cornell University Data generation Sequencing Platforms Short reads: Illumina Long reads: PacBio; Oxford Nanopore Contiging/Scaffolding
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 21 2011, pages 2957 2963 doi:10.1093/bioinformatics/btr507 Genome analysis Advance Access publication September 7, 2011 : fast length adjustment of short reads
More informationAssociation mapping of Sclerotinia stalk rot resistance in domesticated sunflower plant introductions
Association mapping of Sclerotinia stalk rot resistance in domesticated sunflower plant introductions Zahirul Talukder 1, Brent Hulke 2, Lili Qi 2, and Thomas Gulya 2 1 Department of Plant Sciences, NDSU
More informationStatistical Methods, Analyses and Applications for Next-Generation Sequencing Studies
Statistical Methods, Analyses and Applications for Next-Generation Sequencing Studies by Yan Yancy Lo A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy
More informationIntroducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager
Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service Dr. Ruth Burton Product Manager Today s agenda Introduction CytoSure arrays and analysis
More informationWhat is genetic variation?
enetic Variation Applied Computational enomics, Lecture 05 https://github.com/quinlan-lab/applied-computational-genomics Aaron Quinlan Departments of Human enetics and Biomedical Informatics USTAR Center
More informationWhy do we need statistics to study genetics and evolution?
Why do we need statistics to study genetics and evolution? 1. Mapping traits to the genome [Linkage maps (incl. QTLs), LOD] 2. Quantifying genetic basis of complex traits [Concordance, heritability] 3.
More informationPopulation Genetics II. Bio
Population Genetics II. Bio5488-2016 Don Conrad dconrad@genetics.wustl.edu Agenda Population Genetic Inference Mutation Selection Recombination The Coalescent Process ACTT T G C G ACGT ACGT ACTT ACTT AGTT
More informationGENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON
GENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON Matthew Baranski, Casey Jowdy, Hooman Moghadam, Ashie Norris, Håvard Bakke, Anna Sonesson,
More informationHISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY
Third Pavia International Summer School for Indo-European Linguistics, 7-12 September 2015 HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY Brigitte Pakendorf, Dynamique du Langage, CNRS & Université
More informationLecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2015
Lecture 3: Introduction to the PLINK Software Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 PLINK Overview PLINK is a free, open-source whole genome association analysis
More information