The parrot genome: using 454 Flx+ sequencing to identify regulatory traits of vocal learning
|
|
- Lester Hodge
- 6 years ago
- Views:
Transcription
1 The parrot genome: using 454 Flx+ sequencing to identify regulatory traits of vocal learning Erich D. Jarvis Howard Hughes Medical Institute Investigator Duke University Medical Center Department of Neurobiology China Roche 454 Meetings September 2011
2 Motivation: Deciphering the genetic basis of convergent complex traits. Challenges: De-novo genome sequencing and assembly of species with and without the traits of interest. Proper genome assembly and tools for interrogating the genomes.
3 Motivation: Deciphering the genetic basis of convergent complex traits. Challenges: De-novo genome sequencing and assembly of species with and without the traits of interest. Proper genome assembly and tools for interrogating the genomes.
4 5 GROUPS OF MAMMALS HUMANS CETACEANS BATS ELEPHANTS SEA LIONS VOCAL LEARNING (production learning) 3 GROUPS OF BIRDS PARROTS HUMMINGBIRDS SONGBIRDS Different from auditory learning (comprehension and usage learning) Auditory Learning: Dogs can understand the sounds sit (English), sientese (Spanish), osuwari (Japanese). Vocal Learning: Dogs can not learn to say these sounds, but vocal learners can.
5 Convergent behavior: vocal learning substrate for speech AVIAN FAMILY TREE only humans * Vocal learners * Hackett et al 2008 tree Depends on auditory feedback, vocal critical periods, cultural transmission, syntax, Deaf-induced vocal disorders, aphasias, speech sound disorder, possibly autism, * *
6 African Grey Parrot - training to count (concept of one) Pepperberg/Alex
7 Song & speech systems in birds and humans Jarvis 2004 Ann NY Acad Sci; Jarvis et al 2005 Nature Rev. Neurosci.
8 Behaviorally regulated egr1 expression in parrot brain Feenders et al 2008 PLoS ONE
9 Convergent evolution of vocal learning pathways Three alternative hypotheses - Multiple independent gains - Multiple independent losses from common ancestor - Everyone to varying degrees Vocal learning pathways Vocal production pathway Auditory Learning Modified from: Jarvis et al Nature 2000
10 Vocal learning brain pathways in birds & humans Jarvis et al Nature 2000; Jarvis 2004 Ann NY Acad Sci Jarvis 2004 Ann NY Acad Sci
11 FoxP2 - language associated gene Turned on at high levels before vocal imitation starts and is turned down to low levels after vocal learning is complete FoxP2 in finch brain Days Old hatch juvenile song adult tutor song learning complete Haesler, Wada, Nshdejahn, Morrisey, Lints, Jarvis, Scharff J. Neurosci.
12 RNAi knockdown of FoxP2 in songbirds Haesler et al 2007 PLoS Biology.
13 RNAi knockdown of FoxP2 in songbirds Haesler et al 2007 PLoS Biology.
14 Dusp1 gene shows specialized regulation in song nuclei (Immediate early gene involved in neuroprotection) Egr1 Dusp1 Haruhito Horita (graduate student) Graduating 2011 Horita et al (submitted)
15 Dusp1 shows convergent specialized regulation in song nuclei Silent Singing Songbird Hummingbird Parrot Horita et al (submitted)
16 Motivation: Deciphering the genetic basis of convergent complex traits. Challenges: De-novo genome sequencing and assembly of species with and without the traits of interest. Proper genome assembly and tools for interrogating the genomes.
17
18 Add PES Map to # of contig Genome representation (%) Simulated Projection: Sequence & Assembly of Avian Genomes 300, ,000 Contig Assembly , , , ,000 * 3000 * Sequencing data (0.4 Gbp/454 Titanium Runs)
19 No matter how much sequencing, could not get full coverage on some genes. Why? Map budgie sequences from GS 454 runs to three homologous zebra finch genes Gene Gene +/-5Kb Coverage Coding region length Exon coverage 5Kb upstream exons 5Kb downstream exons Identity cutoff: 90% for 40 bp; 10 GS 454 Runs FoxP2 409,706 2,136 bp 97.05% 10.16% 72.09% ROBO1 384,230 4,243 bp 91.52% 15.20% 32.83% egr1 12,949 1,533 bp 81.28% 5.98% 1.25% Identity cutoff: 90% for 40 bp; 25 GS 454 Runs (all libraries except 8Kb) FoxP2 409,706 2,136 bp 99.00% ROBO1 384,230 4,243 bp 91.00% egr1 12,949 1,533 bp 89.60%
20 Sequencing runs used for assemblies 454 Reactions (14X coverage) Titanium shotgun library; 15 runs total (mode ~469bp) 4 x 3 kb Flex paired-end libraries; 5 runs total (~200 bp/end) 8 x 8 kb Flex paired-end libraries 3 runs total (~200 bp/end) 4 x 20 kb Flex paired-end libraries 5 runs total (~200 bp/end) Flex+ shotgun library. 4 runs total (mode ~760bp) Illumina Reactions (8X coverage) 200bp Illumina paired-end; 2 runs (~75bp/end) 200bp Tufts-illumina paired-end; 2 runs (~75bp/end)
21 Read Length of Titanium runs Average read length ~350 bp and mode ~469 bp
22 Read Length of Flx+ runs Average read length 674 bp and mode ~768 bp Inferred error rate under 1.7%
23 Compared assemblies from 3 different types of sequences with 2 assemblers Reads: short read only (200bp paired end; 400 bp shot gun) short + long read (200bp paired end; bp shot gun) short + long read, + illumina reads (75bp paired end) Assemblers: 1. Celera Assembler (CABOG; Adam Phillipy at Univ MD) 2. Newbler Assembler (Roger Winer, James Knight et al at Roche 454; Wes Warren at Wash U)
24 Comparative assembly statistics In a hybrid assembly, illumina pair-end cause scaffold breakdown, because of contaminating mate pairs Assembler Parrot-Celera Parrot-Celera Sequence method 454 short 454+Illum paired Coverage 8X 14X Genome size 1.2Gb 1.2Gb [Scaffolds] TotalBasesInScaffolds 1,022,398,844 1,032,788,935 # of Scaffolds 9,586 10,813 AvgScaffoldSize 106,655 98,174 N50ScaffoldSize 9,471,817 1,689,431 LargestScaffoldSize 55,691,819 7,090,199 Total gaps in scaffolds 131,248 99,828 [Contigs] # of Contigs 170, ,641 AvgContigSize 6,012 9,335 N50ContigSize 10,005 18,667 LargestContigSize 150, ,978
25 Comparative assembly statistics Repair of breakdown; 454 long reads enhance assembly statistics; good as Sanger method Assembler Parrot-Celera Parrot-Celera Parrot-Celera Parrot-Newbler Parrot-Newbler Parrot-Newbler Het Z. Finch-PCAP Chicken-PCAP Sequence method 454 short 454 long 454 long + illum 454 short 454 long 454 long + illum Sanger Sanger v2.1 Coverage 8X 14X 14X 8X 11X 13X 6X 7.1X Genome size 1.2Gb 1.2Gb 1.2Gb 1.2Gb 1.2Gb 1.2Gb 1.2Gb 1.05Gb [Scaffolds] TotalBasesInScaffolds 1,022,398,844 1,079,493,948 1,086,605,544 1,232,754,888 1,179,562,588 1,128,262,411 1,224,525,252 1,047,124,295 # of Scaffolds 9,586 20,685 25,212 37,024 21,081 10,926 37,698 23,776 AvgScaffoldSize 106,655 52,187 43,099 33,296 55, ,263 32,482 44,041 N50ScaffoldSize 9,471,817 12,449,215 11,201,952 4,019,469 7,285,721 6,386,522 10,409,499 11,125,310 LargestScaffoldSize 55,691,819 49,398,065 39,879,305 18,557,224 39,887,084 35,673,135 56,620,707 51,053,708 Total gaps in scaffolds 160,463 54,864 45,651 60, ,736 [Contigs] # of Contigs 170,049 75,549 70, , ,786 71, ,053 85,191 AvgContigSize 6,012 14,289 15,334 4,627 4,821 14,368 9,714 12,291 N50ContigSize 10,005 41,251 55,633 8,622 14,413 27,014 38,549 45,280 LargestContigSize 150, , , , , , , ,663
26 Mummer plot of synteny between Zebra Finch and Budgie draft assemblies: A snapshot of Chr 4 FLX PE, 454 Short reads 100s scaffold FLX PE, 454 Short + Long Reads One ~39.9MB scaffold Zebra Finch Chr 4 [25 MB-65 MB] = 40MB
27 Mummer plot of synteny between Zebra Finch and Budgie draft assemblies: A snapshot of Chr 1 FLX PE, 454 Short Reads 6 scaffolds FLX PE, 454 Short + Long Reads One ~18MB scaffold Zebra Finch Chr 18MB region
28 Assembly of equivalent 400 (titanium) and 760 (Flx+) bp sequence Assembly Metrics Titanium Reads, FLX PE FLX+, Titanium, FLX PE % change with FLX+ runs Sequence Depth estimatedgenomesize MB MB - numalignedreads , 94.48% , 94.53% - numalignedbases , 95.20% , 94.82% - numberassembled numberpartial numbersingleton numberrepeat numberoutlier numberwithbothmapped Scaffold Metrics numberofscaffolds numberofbases avgscaffoldsize N50ScaffoldSize largestscaffoldsize LargeContigMetrics numberofcontigs numberofbases avgcontigsize N50ContigSize largestcontigsize
29 Assembly completeness of 3392 highly homologous exons Cont Scaff Cont Scaff Cont Scaff 454 Flx+ & illumina 454 Flx+ 454 Titanium Used CABOG Celera assembler with different read lengths and technologies. Cont = contigs; Scaff = scaffolds
30 Assembly of genes of interest Single vs multi-exon genes Egr1: 2-exon gene, with high GC rich exon 1 FoxP2: 16-exon gene, with one GC rich exon Dusp1: Gene with repetitive regulatory region Other genes? Use zebra finch exons that >87% identical between finch and chicken to find parrot exons in the assemblies and reads
31 Single exon genes dusp14 Nb-454 short Nb-454 long Nb-hybrid CA-454 short CA-454 long CA-hybrid Nearly all high complexity single exon genes (40-60% GC) thus far examined have full coverage (97-100%) for all assemblies. Nb = Newbler; CA = Celera; 454 short = titanium; 454 long = Flx+; hybrid = 454 short+long+illumina
32 BUT: Many high complexity multi exon genes (40-60% GC) on multiple scaffolds with 454 short reads using Newbler, but assembled on one scaffold using longer reads or Celera. Multi-exon genes GlurR2 assembly Nb-454 short Nb-454 long Nb-hybrid CA-454 short CA-454 long CA-hybrid
33 GC rich exons FoxP2 language evolution Nb-454 short Nb-454 long Nb-hybrid CA-454 short CA-454 long CA-hybrid GC rich exons (>70%) have poorer assembly. Some algorithms can still handle them. Nb = Newbler; CA = Celera; 454 short = titanium; 454 long = Flx+; hybrid = 454 short+long+illumina
34 GC rich exons Dusp6 behaviorally regulated gene Nb-454 Nb-454 long Nb-hybrid CA-454 CA-454 long CA-hybrid EXON 1 missing from some assemblies of the dusp6 gene. What happened? Nb = Newbler; CA = Celera; 454 short = titanium; 454 long = Flx+; hybrid = 454 short+long+illumina
35 Dusp6 reads Sufficient exon 1 reads & overlaps for assembly
36 GC rich exons Dusp6 assembly Nb-454 Nb-454 long Nb-hybrid CA-454 CA-454 long CA-hybrid Conclusions: Newbler - GC exons (60-70%) not brought into scaffold for 454 reads (is contigs), because it was part of alternative paths. 454+illumina hybrid resolved assembly. Celera GC exons (60-70%) in 454 short (400bp) reads placed in degenerate file and not assembled; but long reads (760bp), sequence no longer labeled degenerate and thus assembled.
37 GC rich exons Egr1 behaviorally regulated gene Nb-454 short Nb-454 long Nb-hybrid CA-454 short CA-454 long CA-hybrid EXON 1 missing from all assemblies of egr1 gene. What happened?
38 GC rich exons Egr1 reads shot gun No reads of exon 1 in shot gun. GC rich exon (80%)
39 GC rich exons Egr1 reads paired-end Very few reads of exon 1 in paired-end. GC rich exon (88%)
40 GC rich promoter and exon Egr1 gene assembly Part of promoter and exon 1 missing in all assemblies
41 Even sanger method missing GC rich regions: Egr1 assembly finch Zebra finch genome Chicken genome Parrot genome All species missing GC rich promoter region (75-90%)
42 ~1,200 bp regulatory region of various microsatellite repeats In dusp1 regulatory region GGGATAACAGCACAGCCCTTAAACCCCCCTGGGGTAACAGGACAGCCCTTAAACCCCCCTGGGGTAACTGAGA ACAACCCTTAAACCCCCCTGGGGTAACAGCACAGCTCTTAAACCCCGAATTCTGAATCCACCCTGGCCCCATG GAGCATACACAGAGTGTGTGTGTGAATATGTGATTTTCTGTGTGAATATGTGATTTTGTGTGAATATGTGATT TTGTGTGCGAATATGTGATTCTGTGTGTGAATATGTGATTCTGTGTGTGAATATGTCATTTTCTGTGTGAATA TGTGATTTTGTGTGAATGTGTGATTTTCTGTGTGAATATGTGATAATATGTGATTTTGTGTGTGAATATGTGA TTCTATGTGAATATGTGATTGATTTTCTGTGTGAATATGTGATTTTGTGTGAATGTGTGATTTTTGTGTGAAT ATGTGATTTTCTGTGTGAATATGTGATTTTCTGTGTGAATATGTGATTTTTCAGAAAGTCGCAGGGTGGTTTG GCTCACACTCGCACTCACACTCTCACACACTCACACTCTCTCACTCTCACTCACACTCACACTCACACTCTCA CACTCTCTCACACTCTCTCACACTCTCACACTCTCTCACACACACACTCATACACTCCCACTCACACATACTC TCACACTCACACACTCTCACACTCTCACACTCTAACACACTCACACACTCACACACTCACACTCACACTCATA CTCACACACTCACACACTCACACTCACACTCTAACACACTCACACACTCACACTCACACTCACTTTTTCTCTT TTCTCACTTTTTCTCTCTCCCTCTCCCGCGCTCCGCGGCCGCCCCGCTCCCGATGACGTCGCACCGGCGGGGC GGGCCGCGCCCTCGCTGGCGCGCGGCCAGGCTGACGTCATCGGCCGCCCCGCCCCCCCACGTGACGCGGCCC ATTGAGAAAACGCCGTCCCGCCGCGCGGCCCCATATAAGGGCGGGAGCGGCGGGGCACCGGGACAGCCGGGCC ACCGCACCTCTGAGCTCTGCCCTGCCCTCCTTCCCTCCCCACAGCCATCCCCGCGCTGCCCGGCCATGGTGAA CCTGCGGGTGTGCGCGCTGGACTGCGAGGCGCTGCGGGCGCTGCTGCAGGAGCGCGGCGCGCAGTGCCTCGTC CTCGACTGCCGCTCCTTCTTCTCCTTCAA Horita et al (submitted)
43 Dusp1 convergent promoter changes in vocal learners Vocal learners Vocal non-learners Horita et al (submitted)
44 Dusp1 convergent promoter changes in vocal learners Vocal learners Vocal non-learners Horita et al (submitted)
45 Repetitive microsatellite assembly in dusp1 promoter ATG Nb-454 Nb-454 long Nb-hybrid CA-454 CA-454 long CA-hybrid Conclusions: Only the long reads (~760bp) allowed full and correct assembly of microsattelite repetitive sequence in the parrot dusp1 promoter.
46 Genome (G10K) consortium: Assemblathon 2 competition - parrot Three technologies 454 short (200bp) & long (750 bp) read lengths, shotgun and paired end with 3, 8, 20 Kb insert sizes, 16X coverage (Roche and Duke) Illumina HiSeq(100 bp) paired-end/mate pair reads, 0.2, 0.5, 0.8, 5, 10, 20 and 40Kb insert sizes paired end/mate pair with TruSeq v3 GC chemistry, 120X coverage (BGI & Illumina). Pacbio reads (~3000 bp read length avg, but 15% error), 7, 10Kb insert sizes, 5X coverage (Pacbio)
47 Genome (G10K) consortium: Assemblathon 2 competition - parrot Three technologies 454 long Flx+ Illumina HiSeq. Pacbio long 25 assembly groups: Overlap-Layout-Consensus (e.g. Celera CABOG, PCAP, Newbler, etc.) Eulerian debruijn graps (e.g. ALLPaths, SoapDenovo, Velvet, etc.) Hybrid inventions
48 Genome (G10K) consortium: Assemblathon 2 competition - parrot Three technologies 454 long Flx+ Illumina HiSeq. Pacbio long 25 assembly groups: Overlap-Layout-Consensus (e.g. Celera CABOG, PCAP, Newbler, etc.) Eulerian debruijn graps (e.g. ALLPaths, SoapDenovo, Velvet, etc.) Hybrid inventions Two validation methods: Optical maps (contig and scaffold accuracy) 40K pooled (10) fosmid and single molecule clones sequenced (bp accuracy)
49 Bp coverage Challenges for the future for Flex+ Limitations Cost vs Assembly bp acurarcy vs Assembly completeness Algorithms for hybrid assemblies Overcoming GC rich anti-bias 100X $ low $ high Theoretical predictions to generate high quality assembly 5X $ low 1 Read length 1500
50 Challenges for complete genome assembly Theoretical predictions to generate high quality assembly Close to theory on Dog genome long reads; Less than theory on Panda short reads Schatz et al 2010 Genome Research
51 Jarvis Lab Jason Howard James Ward (Now at NIEHS) Ganesh Ganapathy Haruhito Horita Roche 454 sequencing Duke Genome Center Lisa Bukovnik Ty Wang Olivier Fedrigo Roche support team Xuemin Liu Chinnappa Kodira Illumina sequencing Tin Le (Illumina UK) Guojie Zhang (BGI) Yingrui Li (BGI) Pacbio sequencing Eric Schadt Edwin Hawe Lawrence Lee Acknowledgements Assembly Adam Phillipy (CABOG; Univ Maryland) Sergy Koren (CABOG; Univ Maryland) Wes Warren (Newbler; Wash Univ) James Knight (Newbler; Roche 454) Roger Winer (Newbler; Roche 454) Bo Li (SoapDenovo; BGI) Optical maps David Schwartz Shiguo Zhou Fosmids Jay Shendure Funding NIH Director s Pioneer Award Howard Hughes Medical Institute
52 Previous students and Post Docs now with own labs Dr. Lubica Kubikova Dr. Raphael Pinaud Dr. V. Ann Smith Dr. Liisa Tremere Dr. Kazuhiro Wada Dr. Jing Yu Rui Wang Dr. Osceola Whitney Jason Howard Haru Horita Jarvis lab Maurice Anderson Eric Zhou Michael Silva Gustavo Arriaga Dr. Petra Roulhac Gurkan Yardimchi Andreas Pfenning Dr. Erich Tony Jarvis Zimmermann Theresa Renuart Dr. Miriam Rivas Dr. Chun-Chun Chen Alisa Ray Erina Hara Not present: Nicole Nelson Alyssa Zhu
De novo genome assembly with next generation sequencing data!! "
De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature
More informationSequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements
More informationSequence Assembly and Alignment. Jim Noonan Department of Genetics
Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome
More informationDe Novo Assembly of High-throughput Short Read Sequences
De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,
More informationGenome Assembly. J Fass UCD Genome Center Bioinformatics Core Friday September, 2015
Genome Assembly J Fass UCD Genome Center Bioinformatics Core Friday September, 2015 From reads to molecules What s the Problem? How to get the best assemblies for the smallest expense (sequencing) and
More informationMate-pair library data improves genome assembly
De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate
More informationDe novo whole genome assembly
De novo whole genome assembly Lecture 1 Qi Sun Minghui Wang Bioinformatics Facility Cornell University DNA Sequencing Platforms Illumina sequencing (100 to 300 bp reads) Overlapping reads ~180bp fragment
More informationA Roadmap to the De-novo Assembly of the Banana Slug Genome
A Roadmap to the De-novo Assembly of the Banana Slug Genome Stefan Prost 1 1 Department of Integrative Biology, University of California, Berkeley, United States of America April 6th-10th, 2015 Outline
More informationDe novo whole genome assembly
De novo whole genome assembly Lecture 1 Qi Sun Bioinformatics Facility Cornell University Data generation Sequencing Platforms Short reads: Illumina Long reads: PacBio; Oxford Nanopore Contiging/Scaffolding
More informationGenome Assembly Workshop Titles and Abstracts
Genome Assembly Workshop Titles and Abstracts TUESDAY, MARCH 15, 2011 08:15 AM Richard Durbin, Wellcome Trust Sanger Institute A generic sequence graph exchange format for assembly and population variation
More informationshort read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014
1 short read genome assembly Sorin Istrail CSCI1820 Short-read genome assembly algorithms 3/6/2014 2 Genomathica Assembler Mathematica notebook for genome assembly simulation Assembler can be found at:
More informationSequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio
More informationA shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter
A shotgun introduction to sequence assembly (with Velvet) MCB 247 - Brem, Eisen and Pachter Hot off the press January 27, 2009 06:00 AM Eastern Time llumina Launches Suite of Next-Generation Sequencing
More informationGap Filling for a Human MHC Haplotype Sequence
American Journal of Life Sciences 2016; 4(6): 146-151 http://www.sciencepublishinggroup.com/j/ajls doi: 10.11648/j.ajls.20160406.12 ISSN: 2328-5702 (Print); ISSN: 2328-5737 (Online) Gap Filling for a Human
More informationNext-Generation Sequencing. Technologies
Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062
More informationAssembly of Ariolimax dolichophallus using SOAPdenovo2
Assembly of Ariolimax dolichophallus using SOAPdenovo2 Charles Markello, Thomas Matthew, and Nedda Saremi Image taken from Banana Slug Genome Project, S. Weber SOAPdenovo Assembly Tool Short Oligonucleotide
More informationTitle: High-quality genome assembly of channel catfish, Ictalurus punctatus
Author s response to reviews Title: High-quality genome assembly of channel catfish, Ictalurus punctatus Authors: Qiong Shi (shiqiong@genomics.cn) Xiaohui Chen (xhchenffri@hotmail.com) Liqiang Zhong (lqzhongffri@hotmail.com)
More informationTargeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales
Targeted Sequencing Using Droplet-Based Microfluidics Keith Brown Director, Sales brownk@raindancetech.com Who we are: is a Provider of Microdroplet-based Solutions The Company s RainStorm TM Technology
More informationGenome Assembly, part II. Tandy Warnow
Genome Assembly, part II Tandy Warnow How to apply de Bruijn graphs to genome assembly Phillip E C Compeau, Pavel A Pevzner & Glenn Tesler A mathematical concept known as a de Bruijn graph turns the formidable
More informationGenome Sequencing-- Strategies
Genome Sequencing-- Strategies Bio 4342 Spring 04 What is a genome? A genome can be defined as the entire DNA content of each nucleated cell in an organism Each organism has one or more chromosomes that
More informationComprehensive Views of Genetic Diversity with Single Molecule, Real-Time (SMRT) Sequencing
Comprehensive Views of Genetic Diversity with Single Molecule, Real-Time (SMRT) Sequencing Alix Kieu Cruse November 2015 For Research Use Only. Not for use in diagnostics procedures. Copyright 2015 by
More informationNext Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017
Next Generation Sequencing Jeroen Van Houdt - Leuven 13/10/2017 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977 A Maxam and W Gilbert "DNA seq by chemical degradation" F Sanger"DNA
More informationDe novo Genome Assembly
De novo Genome Assembly A/Prof Torsten Seemann Winter School in Mathematical & Computational Biology - Brisbane, AU - 3 July 2017 Introduction The human genome has 47 pieces MT (or XY) The shortest piece
More informationA near perfect de novo assembly of a eukaryotic genome using sequence reads of greater than 10 kilobases generated by the Pacific Biosciences RS II
A near perfect de novo assembly of a eukaryotic genome using sequence reads of greater than 10 kilobases generated by the Pacific Biosciences RS II W. Richard McCombie Disclosures Introduction to the challenge
More informationNOW GENERATION SEQUENCING. Monday, December 5, 11
NOW GENERATION SEQUENCING 1 SEQUENCING TIMELINE 1953: Structure of DNA 1975: Sanger method for sequencing 1985: Human Genome Sequencing Project begins 1990s: Clinical sequencing begins 1998: NHGRI $1000
More informationHaploid Assembly of Diploid Genomes
Haploid Assembly of Diploid Genomes Challenges, Trials, Tribulations 13 October 2011 İnanç Birol Assembly By Short Sequencing IEEE InfoVis 2009 2 3 in Literature ~40 citations on tool comparisons ~20 citations
More informationWorkflow of de novo assembly
Workflow of de novo assembly Experimental Design Clean sequencing data (trim adapter and low quality sequences) Run assembly software for contiging and scaffolding Evaluation of assembly Several iterations:
More informationOutline. DNA Sequencing. Whole Genome Shotgun Sequencing. Sequencing Coverage. Whole Genome Shotgun Sequencing 3/28/15
Outline Introduction Lectures 22, 23: Sequence Assembly Spring 2015 March 27, 30, 2015 Sequence Assembly Problem Different Solutions: Overlap-Layout-Consensus Assembly Algorithms De Bruijn Graph Based
More informationAnnouncements. Coffee! Evalua,on. Dr. Yoshiki Sasai, R.I.P.
Announcements Coffee! Evalua,on. Dr. Yoshiki Sasai, R.I.P. Sequencing considerations Three basic problems Resequencing, coun,ng, and assembly. A. B. C. 1. Resequencing analysis We know a reference genome,
More informationChromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Supplementary Material
Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions Joshua N. Burton 1, Andrew Adey 1, Rupali P. Patwardhan 1, Ruolan Qiu 1, Jacob O. Kitzman 1, Jay Shendure 1 1 Department
More informationBENG 183 Trey Ideker. Genome Assembly and Physical Mapping
BENG 183 Trey Ideker Genome Assembly and Physical Mapping Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms
More informationThird Generation Sequencing
Third Generation Sequencing By Mohammad Hasan Samiee Aref Medical Genetics Laboratory of Dr. Zeinali History of DNA sequencing 1953 : Discovery of DNA structure by Watson and Crick 1973 : First sequence
More informationAnalysis of Structural Variants using 3 rd generation Sequencing
Analysis of Structural Variants using 3 rd generation Sequencing Michael Schatz January 12, 2016 Bioinformatics / PAG XXIV @mike_schatz / #PAGXXIV Analysis of Structural Variants using 3 rd generation
More informationGenomics AGRY Michael Gribskov Hock 331
Genomics AGRY 60000 Michael Gribskov gribskov@purdue.edu Hock 331 Computing Essentials Resources In this course we will assemble and annotate both genomic and transcriptomic sequence assemblies We will
More informationSequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro
Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Philip Morris International R&D, Philip Morris Products S.A., Neuchatel, Switzerland Introduction Nicotiana sylvestris
More informationMolecular Biology: DNA sequencing
Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides
More informationNext Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms
Next Generation Sequencing Lecture Saarbrücken, 19. March 2012 Sequencing Platforms Contents Introduction Sequencing Workflow Platforms Roche 454 ABI SOLiD Illumina Genome Anlayzer / HiSeq Problems Quality
More informationThe Genome Analysis Centre. Building Excellence in Genomics and Computational Bioscience
Building Excellence in Genomics and Computational Bioscience Wheat genome sequencing: an update from TGAC Sequencing Technology Development now Plant & Microbial Genomics Group Leader Matthew Clark matt.clark@tgac.ac.uk
More informationSequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio
More informationRADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé
RADSeq Data Analysis Through STACKS on Galaxy Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé RAD sequencing: next-generation tools for an old problem INTRODUCTION source: Karim Gharbi
More informationThe tomato genome re-seq project
The tomato genome re-seq project http://www.tomatogenome.net 5 February 2013, Richard Finkers & Sjaak van Heusden Rationale Genetic diversity in commercial tomato germplasm relatively narrow Unexploited
More informationSlide 1. Slide 2. Slide 3
Notes for Voice over on Sequencing Module Slide 1 The purpose of this presentation is to describe an adaptive approach to the sequencing of very large conifer genomes. Long considered a task so daunting
More informationHuman genome sequence
NGS: the basics Human genome sequence June 26th 2000: official announcement of the completion of the draft of the human genome sequence (truly finished in 2004) Francis Collins Craig Venter HGP: 3 billion
More informationAnalysis of large deletions in human-chimp genomic alignments. Erika Kvikstad BioInformatics I December 14, 2004
Analysis of large deletions in human-chimp genomic alignments Erika Kvikstad BioInformatics I December 14, 2004 Outline Mutations, mutations, mutations Project overview Strategy: finding, classifying indels
More informationCloG: a pipeline for closing gaps in a draft assembly using short reads
CloG: a pipeline for closing gaps in a draft assembly using short reads Xing Yang, Daniel Medvin, Giri Narasimhan Bioinformatics Research Group (BioRG) School of Computing and Information Sciences Miami,
More informationCSC Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx By Anonymous. Similarity Index
Page 1 of 6 Document Viewer TurnitinUK Originality Report Processed on: 05-Dec-20 10:49 AM GMT ID: 13 Word Count: 1587 Submitted: 1 CSC8313-201 - Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx
More informationCOPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly
Bioinformatics Advance Access published October 8, 2012 COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly Binghang Liu 1,2,, Jianying Yuan 2,, Siu-Ming Yiu 1,3,
More informationGenomics and Transcriptomics of Spirodela polyrhiza
Genomics and Transcriptomics of Spirodela polyrhiza Doug Bryant Bioinformatics Core Facility & Todd Mockler Group, Donald Danforth Plant Science Center Desired Outcomes High-quality genomic reference sequence
More informationSMRT-assembly Error correction and de novo assembly of complex genomes using single molecule, real-time sequencing
SMRT-assembly Error correction and de novo assembly of complex genomes using single molecule, real-time sequencing Michael Schatz May 10, 2012 Biology of Genomes @mike_schatz / #bog12 Ingredients for a
More informationBioinformatics Advice on Experimental Design
Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics
More informationApplications of PacBio Single Molecule, Real- Time (SMRT) DNA Sequencing
Applications of PacBio Single Molecule, Real- Time (SMRT) DNA Sequencing Stephen Turner November 5, 2014 FIND MEANING IN COMPLEXITY For Research Use Only. Not for use in diagnostic procedures. Pacific
More informationStructural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona
Structural variation Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Genetic variation How much genetic variation is there between individuals? What type of variants
More informationHybrid Error Correction and De Novo Assembly with Oxford Nanopore
Hybrid Error Correction and De Novo Assembly with Oxford Nanopore Michael Schatz Jan 13, 2015 PAG Bioinformatics @mike_schatz / #PAGXXIII Oxford Nanopore MinION Thumb drive sized sequencer powered over
More informationIncorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits
Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing
More informationUltrasequencing: Methods and Applications of the New Generation Sequencing Platforms
Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms Laura Moya Andérico Master in Advanced Genetics Genomics Class December 16 th, 2015 Brief Overview First-generation
More informationMarch 20-23, 2010 Sacramento, CA
Comparison of Commercially Available Target Enrichment Methods for Next Generation Sequencing with the Illumina Platform March 20-23, 2010 Sacramento, CA Anoja Perera, Scottie Adams, David Bintzler, Kip
More informationIntroduction to Bioinformatics. Genome sequencing & assembly
Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly p DNA sequencing How do we obtain DNA sequence information from organisms? p Genome assembly What is needed to put
More informationHigh Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014
High Throughput Sequencing Technologies J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014 Sequencing Explosion www.genome.gov/sequencingcosts http://t.co/ka5cvghdqo Sequencing Explosion
More informationOutline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture
The use of new sequencing technologies for genome analysis Chris Mattocks National Genetics Reference Laboratory (Wessex) NGRL (Wessex) 2008 Outline General principles of clonal sequencing Analysis principles
More informationIllumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme
Illumina (Solexa) Current market leader Based on sequencing by synthesis Current read length 100-150bp Paired-end easy, longer matepairs harder Error ~0.1% Mismatch errors dominate Throughput: 4 Tbp in
More informationOpportunities offered by new sequencing technologies
Opportunities offered by new sequencing technologies Pierre Taberlet Laboratoire d'ecologie Alpine CNRS UMR 5553 Université Joseph Fourier, Grenoble, France Nature Biotechnology, October 2008: special
More informationHow much sequencing do I need? Emily Crisovan Genomics Core
How much sequencing do I need? Emily Crisovan Genomics Core How much sequencing? Three questions: 1. How much sequence is required for good experimental design? 2. What type of sequencing run is best?
More informationBayesian Networks as framework for data integration
Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences Icahn Institute of Genomics and Multiscale Biology Icahn Medical School at Mount Sinai New
More informationDNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)
DNA-Sequencing Technologies & Devices Matthias Platzer Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI) Genome analysis DNA sequencing platforms ABI 3730xl 4/2004 & 6/2006 1 Mb/day,
More informationResearch school methods seminar Genomics and Transcriptomics
Research school methods seminar Genomics and Transcriptomics Stephan Klee 19.11.2014 2 3 4 5 Genetics, Genomics what are we talking about? Genetics and Genomics Study of genes Role of genes in inheritence
More informationTitle: Genome sequence of lineage III Listeria monocytogenes strain HCC23
JB Accepts, published online ahead of print on 20 May 2011 J. Bacteriol. doi:10.1128/jb.05236-11 Copyright 2011, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.
More informationDe novo genome assembly. Dr Torsten Seemann
De novo genome assembly Dr Torsten Seemann IMB Winter School - Brisbane Mon 1 July 2013 Introduction Ideal world I would not need to give this talk! Human DNA Non-existent USB3 device AGTCTAGGATTCGCTA
More informationDNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)
DNA-Sequencing Technologies & Devices Matthias Platzer Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI) Genome analysis DNA sequencing platforms ABI 3730xl 4/2004 & 6/2006 1 Mb/day,
More informationCM581A2: NEXT GENERATION SEQUENCING PLATFORMS AND LIBRARY GENERATION
CM581A2: NEXT GENERATION SEQUENCING PLATFORMS AND LIBRARY GENERATION Fall 2015 Instructors: Coordinator: Carol Wilusz, Associate Professor MIP, CMB Instructor: Dan Sloan, Assistant Professor, Biology,
More informationApplying Genotyping by Sequencing (GBS) to Corn Genetics and Breeding. Peter Bradbury USDA/Cornell University
Applying Genotyping by Sequencing (GBS) to Corn Genetics and Breeding Peter Bradbury USDA/Cornell University Genotyping by sequencing (GBS) makes use of high through-put, short-read sequencing to provide
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of
More informationNext Generation Sequencing for Metagenomics
Next Generation Sequencing for Metagenomics Genève, 13.10.2016 Patrick Wincker, Genoscope-CEA Human and model organisms sequencing were initially based on the Sanger method Sanger shotgun sequencing was
More informationDe novo assembly of human genomes with massively parallel short read sequencing
Resource De novo assembly of human genomes with massively parallel short read sequencing Ruiqiang Li, 1,2,3 Hongmei Zhu, 1,3 Jue Ruan, 1,3 Wubin Qian, 1 Xiaodong Fang, 1 Zhongbin Shi, 1 Yingrui Li, 1 Shengting
More informationHigh Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Tuesday December 16, 2014
High Throughput Sequencing Technologies J Fass UCD Genome Center Bioinformatics Core Tuesday December 16, 2014 Sequencing Explosion www.genome.gov/sequencingcosts http://t.co/ka5cvghdqo Sequencing Explosion
More informationShuji Shigenobu. April 3, 2013 Illumina Webinar Series
Shuji Shigenobu April 3, 2013 Illumina Webinar Series RNA-seq RNA-seq is a revolutionary tool for transcriptomics using deepsequencing technologies. genome HiSeq2000@NIBB (Wang 2009 with modifications)
More informationHunting Down the Papaya Transgenes
Hunting Down the Papaya Transgenes Michael Schatz Center for Bioinformatics and Computational Biology University of Maryland January 16, 2008 PAG XVI Papaya Overview Carica papaya from the order Brassicales
More informationNext Generation Sequencing Technologies. Rob Mitra 1/30/17
Next Generation Sequencing Technologies Rob Mitra 1/30/17 Outline Overview of next-generation sequencing How does it work? What technologies are being used? How would one use it in practice? Math basic
More informationModern Epigenomics. Histone Code
Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome Sciences and Systems Biology Washington University Dragon Star 2012 Changchun, China July 2, 2012 DNA methylation + Histone
More informationNext Gen Sequencing. Expansion of sequencing technology. Contents
Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND
More informationGenome sequence of Acinetobacter baumannii MDR-TJ
JB Accepts, published online ahead of print on 11 March 2011 J. Bacteriol. doi:10.1128/jb.00226-11 Copyright 2011, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.
More informationBioinformatics and computational tools
Bioinformatics and computational tools Etienne P. de Villiers (PhD) International Livestock Research Institute Nairobi, Kenya International Livestock Research Institute Nairobi, Kenya ILRI works at the
More informationTypically, to be biologically related means to share a common ancestor. In biology, we call this homologous
Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous. Two proteins sharing a common ancestor are said to be homologs. Homologyoften implies structural
More informationLivestock Genomics: The Odyssey
Livestock Genomics: The Odyssey Jim Womack, Texas A&M University NRSP-8 Animal Genome Workshop Plant and Animal Genome XX, Jan 15, 2012 Thanks, Geoff and Workshop Committee BRD? Rift Valley Fever? HISTORY!!!
More informationNature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.
Supplementary Figure 1 Number and length distributions of the inferred fosmids. Fosmid were inferred by mapping each pool s sequence reads to hg19. We retained only those reads that mapped to within a
More informationLectures 18, 19: Sequence Assembly. Spring 2017 April 13, 18, 2017
Lectures 18, 19: Sequence Assembly Spring 2017 April 13, 18, 2017 1 Outline Introduction Sequence Assembly Problem Different Solutions: Overlap-Layout-Consensus Assembly Algorithms De Bruijn Graph Based
More information1000 Insect Transcriptomes Evolution - 1KITE
1KITE 1K Insect Transcriptome Evolution 1000 Insect Transcriptomes Evolution - 1KITE An Example of Handling "Big Data" Karen Meusemann, on behalf of the 1KITE Consortium CSIRO Ecosystem Sciences, Australian
More informationLocal assembly and pre-mrna splicing analyses by high-throughput sequencing data
Graduate Theses and Dissertations Graduate College 2012 Local assembly and pre-mrna splicing analyses by high-throughput sequencing data Hsien-chao Chou Iowa State University Follow this and additional
More informationRIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP)
Application Note: RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP) Introduction: Innovations in DNA sequencing during the 21st century have revolutionized our ability to obtain nucleotide information
More informationSequencing Theory. Brett E. Pickett, Ph.D. J. Craig Venter Institute
Sequencing Theory Brett E. Pickett, Ph.D. J. Craig Venter Institute Applications of Genomics and Bioinformatics to Infectious Diseases GABRIEL Network Agenda Sequencing Instruments Sanger Illumina Ion
More informationRNA-Seq analysis workshop
RNA-Seq analysis workshop Zhangjun Fei Boyce Thompson Institute for Plant Research USDA Robert W. Holley Center for Agriculture and Health Cornell University Outline Background of RNA-Seq Application of
More informationHigh Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015
High Throughput Sequencing Technologies UCD Genome Center Bioinformatics Core Monday 15 June 2015 Sequencing Explosion www.genome.gov/sequencingcosts http://t.co/ka5cvghdqo Sequencing Explosion 2011 PacBio
More informationWhat the Genome of Raffaelea lauricola Can Tell Us About Laurel Wilt
What the Genome of Raffaelea lauricola Can Tell Us About Laurel Wilt Laurel Wilt Summit November 3-4, 2016 Dr. Jeffrey Rollins Associate Professor Plant Pathology Department University of Florida Gainesville,
More information1. A brief overview of sequencing biochemistry
Supplementary reading materials on Genome sequencing (optional) The materials are from Mark Blaxter s lecture notes on Sequencing strategies and Primary Analysis 1. A brief overview of sequencing biochemistry
More informationEach cell of a living organism contains chromosomes
COVER FEATURE Genome Sequence Assembly: Algorithms and Issues Algorithms that can assemble millions of small DNA fragments into gene sequences underlie the current revolution in biotechnology, helping
More informationMapping strategies for sequence reads
Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements
More informationEfficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads
Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads Authors Rei Kajitani 1, Kouta Toshimoto 1,2, Hideki Noguchi 3, Atsushi Toyoda 3,4, Yoshitoshi Ogura 5, Miki
More informationLawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory
Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title: Genome Sequence Databases (Overview): Sequencing and Assembly Author: Lapidus, Alla L. Publication Date: 08-25-2009 Publication
More informationNGS technologies approaches, applications and challenges!
www.supagro.fr NGS technologies approaches, applications and challenges! Jean-François Martin Centre de Biologie pour la Gestion des Populations Centre international d études supérieures en sciences agronomiques
More informationHiSeqTM 2000 Sequencing System
IET International Equipment Trading Ltd. www.ietltd.com Proudly serving laboratories worldwide since 1979 CALL +847.913.0777 for Refurbished & Certified Lab Equipment HiSeqTM 2000 Sequencing System Performance
More information