Figure S1. Data flow of de novo genome assembly using next generation sequencing data from multiple platforms.

Size: px
Start display at page:

Download "Figure S1. Data flow of de novo genome assembly using next generation sequencing data from multiple platforms."

Transcription

1 Supplemental Figures Figure S1. Data flow of de novo genome assembly using next generation sequencing data from multiple platforms. Figure S2. Factors effect on the quality of de novo genome assembly. (A) The influences of data coverage to assembled genome size and contig length. (B) The effect of Kmer size on contig length at given 15 coverage data. (C) The effect of add-on multiplex sequencing data in the de novo assembly. Figure S3. Length and quality distribution of PacBio reads. Ten SMRT Cell runs generated 513,084 reads with average length 3.1 kb and total ~1.4 Gb data. (A) Length distribution of PacBio long reads. (B) Quality distribution of PacBio long reads. Figure S4. PacBio reads rescue. Mapping accurate high quality high coverage Illumina short reads against the low quality PacBio reads generated consensus sequences. Figure S5. The benefit of including PacBio long reads in the de novo assembly. Contig length (A) and assembled genome size (B) increased after adding PacBio reads to 454 short reads. Contig length was increased (C) and contig number (D) was reduced after added PacBio reads to Illumina short pairedend reads. Figure S6. Distance distribution of true mate-paired reads (A) and false mate-paired reads (B). Figure S7. Repeats distribution of all SSR loci. Figure S8. Distribution of unique transcripts in the horseweed genome by gene ontology (GO) annotation. (A) Molecular function (MF). (B) Biological process (BP). (C) Cellular component (CC). (D) Enzyme code distribution. Figure S9. Enriched GO terms in transporter subgroup of molecular functional category in the horseweed genome compared with Arabidopsis by analysis with Fisher's Exact Test. The overrepresented terms are shown in green. Figure S10. Map of horseweed chloroplast genome. Two inverted repeats (IRs: 24,936 bp each); large single-copy (LSC: 84,634 bp) and small single-copy (SSC: 18,063 bp), contain 95 protein-coding genes (88 unique), 39 trna (28 unique) and 8 rrna (4 unique).

2 Figure S11. Syntenic dot plot analysis of the horseweed chloroplast genome compared with sunflower ( Helianthus annuus) and lettuce (Lactuca sativa). A. Lettuce vs. horseweed. B. Horseweed vs. sunflower. Each black dot represent locus variation between genomes. Figure S12. De novo assembly of the horseweed mitochondrial genome with N 50 = 7,943 bp. The largest contig was 43,498 bp and the smallest contig was 315 bp. The mean G/C content was 40.1%. Figure S13. Morphological variation among horseweed populations. The plants were grown together under the same conditions and the photos were taken at the same stage (the same number of days after germination). Figure S14. Distribution of specific variants in genomes of glyphosate susceptible horseweed biotypes (A) and in genomes of glyphosate resistant biotypes (B) by Fisher s Exact test. Figure S15. Alignment of EPSPS protein sequences from different horseweed biotypes. The variant residues are highlighted with red lines Supplemental Tables Table S1. Results of mapping mate-paired reads to de novo assembled contigs. Table S2. Average GC content of plant genomes. Table S3. Summary of SSR markers in the assembled horseweed genome. Table S4. Summary of plant mitochondrial genomes. Table S5. Matrix of nucleotide diversity (kb per variant) within genome among horseweed biotypes Supplemental file 1 HWCP annotation.txt Supplemental file 2 HWMT genome.fas 61 62

3 Figure S1. Data flow of de novo genome assembly using next generation sequencing data from multiple platforms

4 Figure S2. Factors effect on the quality of de novo genome assembly. (A) The influences of data coverage to assembled genome size and contig length. (B) The effect of Kmer size on contig length at given 15 x coverage data. (C) The effect of added multiplex sequencing data in de novo assembly

5 Figure S3. Length and quality distribution of PacBio reads. Ten SMRT Cell runs generated 513,084 reads with average length 3.1 kb and total ~1.4 Gb data. (A) Length distribution of PacBio long reads. (B) Quality distribution of PacBio long reads

6 Figure S4. PacBio reads rescue. Mapping accurate high quality high coverage Illumina short reads against the low quality PacBio reads generated consensus sequences

7 Figure S5. The benefit of including PacBio long reads in the de novo assembly. Contig length (A) and assembled genome size (B) increased after adding PacBio reads to 454 short reads. Contig length was increased (C) and contig number (D) was reduced after added PacBio reads to Illumina short paired-end reads

8 Figure S6. Distance distribution of true mate-paired reads (A) and false mate-paired reads (B)

9 Figure S7. Repeats distribution of all SSR loci

10 Figure S8. Distribution of unique transcripts in the horseweed genome by gene ontology (GO) annotation. (A) Molecular function (MF). (B) Biological process (BP). (C) Cellular component (CC). (D) Enzyme code distribution

11 Figure S9. Enriched GO terms in transporter subgroup of molecular functional category in the horseweed genome compared with Arabidopsis by Fisher's Exact test analysis. The overrepresented terms are shown in green

12 Figure S10. Map of horseweed chloroplast genome. Two inverted repeats (IRs: 24,936 bp each); large single-copy (LSC: 84,634 bp) and small single-copy (SSC: 18,063 bp), contain 95 protein-coding genes (88 unique), 39 trna (28 unique) and 8 rrna (4 unique)

13 Figure S11. Syntenic dot plot analysis of the horseweed chloroplast genome compared with sunflower (Helianthus annuus) and lettuce (Lactuca sativa). A. Lettuce vs. horseweed. B. Horseweed vs. sunflower. Each black dot represent locus variation between genomes. 211

14 Figure S12. De novo assembly horseweed mitochondrial genome with N 50 = 7,943 bp. The largest contig was 43,498 bp and the smallest contig was 315 bp. Mean G/C content was 40.1%

15 Figure S13. Morphological variation among horseweed populations. The plants were grown together under the same conditions and photographs were taken at the same stage (the same number of days after germination)

16 Figure S14. Distribution of specific variants in genomes of glyphosate susceptible horseweed biotypes (A) and in genomes of glyphosate resistant biotypes (B) by Fisher s Exact test

17 Figure S15. Alignment of EPSPS1 (A) EPSPS2 (B) protein sequences from different horseweed biotypes. The variant residues are highlighted with red lines

18 Table S1. Results of mapping mate-paired reads to de novo assembled contigs

19 Table S2. Average GC content of selected plant genomes. Species GC content (%) Arabidopsis thaliana 36.0 Brachypodium distachyon 46.3 Conyza canadensis 34.9 Fragaria vesca 35.8 Glycine max 34.2 Lotus japonicus 35.6 Malus domestica 27.6 Medicago truncatula 27.2 Oryza sativa 42.4 Populus trichocarpa 32.9 Solanum lycopersicum 32.1 Sorghum bicolor 41.5 Vitis vinifera 33.7 Zea mays

20 Table S3. Summary of SSR markers in the assembled horseweed genome. 287 SSR types Dimer Trimer Tetramer Pentamer Hexamer Total Counts (%) 44,589 (85.9) 6,864 (13.2) 299 (0.58) 60 (0.12) 80 (0.15) 51,892 (100)

21 Table S4. Summary of selected plant mitochondrial genomes Species Genome size (bp) G/C content (%) Brassica carinata 232, Silene latifolia 253, Helianthus annuus 300, Arabidopsis thaliana 366, Citrullus lanatus 379, Vigna radiata 401, Nicotiana tabacum 430, Triticum aestivum 452, Conyza canadensis 453, Sorghum bicolor 468, Carica papaya 476, Oryza sativa 490, Ricinus communis 502, Zea mays 569, Tripsacum dactyloides 704, Vitis vinifera 773, Cucurbita pepo 982,

22 Table S5. Matrix of nucleotide diversity (kb per variant) within genomes among horseweed biotypes. Biotypes CA-R CA-S DE-R DE-S IN-R IN-S TN-S CA-S 1.14 DE-R DE-S IN-R IN-S TN-S TN-R

Genomics and Transcriptomics of Spirodela polyrhiza

Genomics and Transcriptomics of Spirodela polyrhiza Genomics and Transcriptomics of Spirodela polyrhiza Doug Bryant Bioinformatics Core Facility & Todd Mockler Group, Donald Danforth Plant Science Center Desired Outcomes High-quality genomic reference sequence

More information

Genome Assembly With Next Generation Sequencers

Genome Assembly With Next Generation Sequencers Genome Assembly With Next Generation Sequencers Personal Genomics Institute 3 May, 2011 Jongsun Park Table of Contents 1 Central Dogma and Omics Studies 2 History of Sequencing Technologies 3 Genome Assembly

More information

BIOINFORMATICS AN OVERVIEW

BIOINFORMATICS AN OVERVIEW BIOINFORMATICS AN OVERVIEW T.R. Sharma Genoinformatics Lab, National Research Centre on Plant Biotechnology I.A.R.I, New Delhi 110012 trsharma@nrcpb.org Introduction Bioinformatics is the computational

More information

Supplemental Data. Wang et al. (2014). Plant Cell /tpc

Supplemental Data. Wang et al. (2014). Plant Cell /tpc Supplemental Figure 1. Analysis of mature mirna sequences of the mir172 family members. Mature sequences of mir172 family members were aligned using software MEGA5. The stars indicate nucleotides conserved

More information

Overview of the next two hours...

Overview of the next two hours... Overview of the next two hours... Before tea Session 1, Browser: Introduction Ensembl Plants and plant variation data Hands-on Variation in the Ensembl browser Displaying your data in Ensembl After tea

More information

Supplemental Figure 1. Phylogenetic relationship of 128 LCAT-like sequences from 38 plant species. The maximum likelihood tree was generated using

Supplemental Figure 1. Phylogenetic relationship of 128 LCAT-like sequences from 38 plant species. The maximum likelihood tree was generated using Supplemental Figure 1. Phylogenetic relationship of 128 LCAT-like sequences from 38 plant species. The maximum likelihood tree was generated using the MrBayes program. The land plant LCAT-like sequences

More information

pmyrsaur78 psosein4+ psosetr1+ psosetr2+

pmyrsaur78 psosein4+ psosetr1+ psosetr2+ SAUR77 SAUR78 SAUR76 Figure S1 Phylogenetic analysis of Arabidopsis SAUR proteins. The analyses were conducted in MEGA5, using the Neighbor-Joining method. The percentage of replicate trees in which the

More information

Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly

Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly Supplementary Tables Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly Library Read length Raw data Filtered data insert size (bp) * Total Sequence depth Total Sequence

More information

1 st transplant user training workshop Versailles, 12th-13th November 2012

1 st transplant user training workshop Versailles, 12th-13th November 2012 trans-national Infrastructure for Plant Genomic Science 1 st transplant user training workshop Versailles, 12th-13th November 2012 Paul Kersey, EMBL-EBI More people, less land Plant genome sequences, 2005

More information

RNA Sequencing Analyses & Mapping Uncertainty

RNA Sequencing Analyses & Mapping Uncertainty RNA Sequencing Analyses & Mapping Uncertainty Adam McDermaid 1/26 RNA-seq Pipelines Collection of tools for analyzing raw RNA-seq data Tier 1 Quality Check Data Trimming Tier 2 Read Alignment Assembly

More information

SUPPLEMENTAL FILES. Supplemental Figure 1. Expression domains of Arabidopsis HAM orthologs in both shoot meristem and root tissues.

SUPPLEMENTAL FILES. Supplemental Figure 1. Expression domains of Arabidopsis HAM orthologs in both shoot meristem and root tissues. SUPPLEMENTAL FILES SUPPLEMENTAL FIGURE LEGENDS Supplemental Figure 1. Expression domains of Arabidopsis HAM orthologs in both shoot meristem and root tissues. RT-PCR amplification of AtHAM1, AtHAM2, AtHAM3,

More information

Transcriptome analysis

Transcriptome analysis Statistical Bioinformatics: Transcriptome analysis Stefan Seemann seemann@rth.dk University of Copenhagen April 11th 2018 Outline: a) How to assess the quality of sequencing reads? b) How to normalize

More information

Mate-pair library data improves genome assembly

Mate-pair library data improves genome assembly De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate

More information

From the genome to the field : how to improve the isolation of genomic regions of interest for plant breeding.

From the genome to the field : how to improve the isolation of genomic regions of interest for plant breeding. From the genome to the field : how to improve the isolation of genomic regions of interest for plant breeding. Hélène BERGES Director of the French Plant Genomic Center INRA Toulouse The French Plant Genomic

More information

De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes

De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes Ashrafi et al. BMC Genomics 2012, 13:571 RESEARCH ARTICLE Open Access De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes

More information

Nature Biotechnology: doi: /nbt.3943

Nature Biotechnology: doi: /nbt.3943 Supplementary Figure 1. Distribution of sequence depth across the bacterial artificial chromosomes (BACs). The x-axis denotes the sequencing depth (X) of each BAC and y-axis denotes the number of BACs

More information

vector AvrRpt2 vector AvrRpt2 AvrRpt2 3.5 hours 6 hours

vector AvrRpt2 vector AvrRpt2 AvrRpt2 3.5 hours 6 hours A 30 kd RIN4 ACP2 17 kd vector AvrRpt2 vector AvrRpt2 AvrRpt2 3.5 hours 6 hours B ACP3 7 kd vector AvrRpt2 vector AvrRpt2 total membrane Figure S1. Accumulation of ACP2 and ACP3 upon cleavage of native

More information

Deep Sequencing technologies

Deep Sequencing technologies Deep Sequencing technologies Gabriela Salinas 30 October 2017 Transcriptome and Genome Analysis Laboratory http://www.uni-bc.gwdg.de/index.php?id=709 Microarray and Deep-Sequencing Core Facility University

More information

Hunting Down the Papaya Transgenes

Hunting Down the Papaya Transgenes Hunting Down the Papaya Transgenes Michael Schatz Center for Bioinformatics and Computational Biology University of Maryland January 16, 2008 PAG XVI Papaya Overview Carica papaya from the order Brassicales

More information

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements

More information

De novo whole genome assembly

De novo whole genome assembly De novo whole genome assembly Lecture 1 Qi Sun Minghui Wang Bioinformatics Facility Cornell University DNA Sequencing Platforms Illumina sequencing (100 to 300 bp reads) Overlapping reads ~180bp fragment

More information

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz

Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Table of Contents Supplementary Note 1: Unique Anchor Filtering Supplementary Figure

More information

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA

More information

Mining functional microsatellites in legume unigenes

Mining functional microsatellites in legume unigenes www.bioinformation.net Hypothesis Volume 7(5) Mining functional microsatellites in legume unigenes Manish Roorkiwal 1,2 & Prakash Chand Sharma 1 * 1University School of Biotechnology, Guru Gobind Singh

More information

Supplemental Data. Shyu et al. (2012). Plant Cell /tpc

Supplemental Data. Shyu et al. (2012). Plant Cell /tpc JAZ11 JAZ12 JAZ6 JAZ5 JAZ10 JAZ9 JAZ2 JAZ1 JAZ3 JAZ4 JAZ7 Supplemental Figure 1. Phylogenetic Tree Constructed from the Jas Motif of Arabidopsis JAZ Proteins. The 27-amino-acid Jas motif in all 12 Arabidopsis

More information

Genomic resources. for non-model systems

Genomic resources. for non-model systems Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing

More information

De novo whole genome assembly

De novo whole genome assembly De novo whole genome assembly Lecture 1 Qi Sun Bioinformatics Facility Cornell University Data generation Sequencing Platforms Short reads: Illumina Long reads: PacBio; Oxford Nanopore Contiging/Scaffolding

More information

Supplemental Data. Borg et al. Plant Cell (2014) /tpc

Supplemental Data. Borg et al. Plant Cell (2014) /tpc Supplementary Figure 1 - Alignment of selected angiosperm DAZ1 and DAZ2 homologs Multiple sequence alignment of selected DAZ1 and DAZ2 homologs. A consensus sequence built using default parameters is shown

More information

Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the

Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the Supplementary Information Supplementary Figures Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the strain M8 of S. ruber and a fosmid containing the S. ruber M8 virus M8CR4

More information

Supplemental Figure 1

Supplemental Figure 1 Supplemental Figure 1 A LK sls1 lks1-2 F 1 sls1 B LK sls1 lks1-2 F 1 lks1-2 sls1 F 1 lks1-2 sls1 F 2 Col lks1-2 Col+LKS1 Col Col+LKS1 Supplemental Figure 1. Genetic analysis of sls1 mutant. (A) and (B)

More information

De novo whole genome assembly

De novo whole genome assembly De novo whole genome assembly Qi Sun Bioinformatics Facility Cornell University Sequencing platforms Short reads: o Illumina (150 bp, up to 300 bp) Long reads (>10kb): o PacBio SMRT; o Oxford Nanopore

More information

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience Building Excellence in Genomics and Computa5onal Bioscience Resequencing approaches Sarah Ayling Crop Genomics and Diversity sarah.ayling@tgac.ac.uk Why re- sequence plants? To iden

More information

Looking Ahead: Improving Workflows for SMRT Sequencing

Looking Ahead: Improving Workflows for SMRT Sequencing Looking Ahead: Improving Workflows for SMRT Sequencing Jonas Korlach FIND MEANING IN COMPLEXITY Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, and SMRTbell are trademarks of Pacific Biosciences

More information

Genome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does

More information

Supplemental Data. Farmer et al. (2010) Plant Cell /tpc

Supplemental Data. Farmer et al. (2010) Plant Cell /tpc Supplemental Figure 1. Amino acid sequence comparison of RAD23 proteins. Identical and similar residues are shown in the black and gray boxes, respectively. Dots denote gaps. The sequence of plant Ub is

More information

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis -Seq Analysis Quality Control checks Reproducibility Reliability -seq vs Microarray Higher sensitivity and dynamic range Lower technical variation Available for all species Novel transcript identification

More information

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Sequence Assembly and Alignment. Jim Noonan Department of Genetics Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome

More information

De Novo and Hybrid Assembly

De Novo and Hybrid Assembly On the PacBio RS Introduction The PacBio RS utilizes SMRT technology to generate both Continuous Long Read ( CLR ) and Circular Consensus Read ( CCS ) data. In this document, we describe sequencing the

More information

Fig. S1. Clustering analysis of expression array and ChIP-PCR assay in the ARF3 locus. (A) Typical examples of the transgenic plants used for

Fig. S1. Clustering analysis of expression array and ChIP-PCR assay in the ARF3 locus. (A) Typical examples of the transgenic plants used for Fig. S1. Clustering analysis of expression array and ChIP-PCR assay in the ARF3 locus. (A) Typical examples of the transgenic plants used for ChIP-chip and ChIP-PCR assays. The presence of pas1:t7:as1

More information

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Philip Morris International R&D, Philip Morris Products S.A., Neuchatel, Switzerland Introduction Nicotiana sylvestris

More information

Steps in Genetic Analysis

Steps in Genetic Analysis Molecular Tools Steps in Genetic Analysis 1. Knowing how many genes determine a phenotype, and where the genes are located, is a first step in understanding the genetic basis of a phenotype 2. A second

More information

A near perfect de novo assembly of a eukaryotic genome using sequence reads of greater than 10 kilobases generated by the Pacific Biosciences RS II

A near perfect de novo assembly of a eukaryotic genome using sequence reads of greater than 10 kilobases generated by the Pacific Biosciences RS II A near perfect de novo assembly of a eukaryotic genome using sequence reads of greater than 10 kilobases generated by the Pacific Biosciences RS II W. Richard McCombie Disclosures Introduction to the challenge

More information

srnas of Plants Function o Transcriptional silencing o Posttranscriptional gene silencing

srnas of Plants Function o Transcriptional silencing o Posttranscriptional gene silencing srnas of Plants Small, Non-coding RNAs of Plants Regulatory RNAs that act through gene silencing Two classes of small RNAs (srnas) o microrna (mirnas) Encoded by mirna genes in the genome o small interfering

More information

Supplementary Figure 1. The tree of Chinese jujube and its growing environment. The jujube has a very long lifecycle, even more than 1000 productive

Supplementary Figure 1. The tree of Chinese jujube and its growing environment. The jujube has a very long lifecycle, even more than 1000 productive Supplementary Figure 1. The tree of Chinese jujube and its growing environment. The jujube has a very long lifecycle, even more than 1000 productive years. It is well adapted to drought and salinity. Supplementary

More information

Plant Science 179 (2010) Contents lists available at ScienceDirect. Plant Science. journal homepage:

Plant Science 179 (2010) Contents lists available at ScienceDirect. Plant Science. journal homepage: Plant Science 179 (2010) 407 422 Contents lists available at ScienceDirect Plant Science journal homepage: www.elsevier.com/locate/plantsci Review High throughput DNA sequencing: The new sequencing revolution

More information

Toward a better understanding of plant genomes structure: combining NGS and optical mapping technology to improve the sunflower assembly

Toward a better understanding of plant genomes structure: combining NGS and optical mapping technology to improve the sunflower assembly Toward a better understanding of plant genomes structure: combining NGS and optical mapping technology to improve the sunflower assembly Céline CHANTRY-DARMON 1 CNRGV The French Plant Genomic Center Created

More information

De novo genome assembly with next generation sequencing data!! "

De novo genome assembly with next generation sequencing data!! De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature

More information

Next-generation sequencing technologies

Next-generation sequencing technologies Next-generation sequencing technologies NGS applications Illumina sequencing workflow Overview Sequencing by ligation Short-read NGS Sequencing by synthesis Illumina NGS Single-molecule approach Long-read

More information

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 23, 2017 Differential Gene Expression Generalized Workflow File Types

More information

De Novo Transcript Discovery using Long and Short Reads

De Novo Transcript Discovery using Long and Short Reads De Novo Transcript Discovery using Long and Short Reads December 4, 2018 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com

More information

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center High Throughput Sequencing the Multi-Tool of Life Sciences Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center Complementary Approaches Illumina Still-imaging of clusters (~1000

More information

The goal of this project was to prepare the DEUG contig which covers the

The goal of this project was to prepare the DEUG contig which covers the Prakash 1 Jaya Prakash Dr. Elgin, Dr. Shaffer Biology 434W 10 February 2017 Finishing of DEUG4927010 Abstract The goal of this project was to prepare the DEUG4927010 contig which covers the terminal 99,279

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Biol 478/595 Intro to Bioinformatics

Biol 478/595 Intro to Bioinformatics Biol 478/595 Intro to Bioinformatics September M 1 Labor Day 4 W 3 MG Database Searching Ch. 6 5 F 5 MG Database Searching Hw1 6 M 8 MG Scoring Matrices Ch 3 and Ch 4 7 W 10 MG Pairwise Alignment 8 F 12

More information

Sampling methodology for detection of living modified organisms

Sampling methodology for detection of living modified organisms Sampling methodology for detection of living modified organisms Dr. Moolchand Singh Senior Scientist Division of Plant Quarantine National Bureau of Plant Genetic Resources, New Delhi Sampling Procedure

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

RNA-SEQUENCING ANALYSIS

RNA-SEQUENCING ANALYSIS RNA-SEQUENCING ANALYSIS Joseph Powell SISG- 2018 CONTENTS Introduction to RNA sequencing Data structure Analyses Transcript counting Alternative splicing Allele specific expression Discovery APPLICATIONS

More information

Toward a better understanding of plant genomes structure : Combining NGS, optical mapping technology and CRISPR-CATCH approach

Toward a better understanding of plant genomes structure : Combining NGS, optical mapping technology and CRISPR-CATCH approach Toward a better understanding of plant genomes structure : Combining NGS, optical mapping technology and CRISPR-CATCH approach Hélène BERGES Director of the Plant Genomic Center Global warming effects,

More information

COMPUTATIONAL PREDICTION AND CHARACTERIZATION OF A TRANSCRIPTOME USING CASSAVA (MANIHOT ESCULENTA) RNA-SEQ DATA

COMPUTATIONAL PREDICTION AND CHARACTERIZATION OF A TRANSCRIPTOME USING CASSAVA (MANIHOT ESCULENTA) RNA-SEQ DATA COMPUTATIONAL PREDICTION AND CHARACTERIZATION OF A TRANSCRIPTOME USING CASSAVA (MANIHOT ESCULENTA) RNA-SEQ DATA AOBAKWE MATSHIDISO, SCOTT HAZELHURST, CHRISSIE REY Wits Bioinformatics, University of the

More information

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis Data Basics Josef K Vogt Slides by: Simon Rasmussen 2017 Generalized NGS analysis Sample prep & Sequencing Data size Main data reductive steps SNPs, genes, regions Application Assembly: Compare Raw Pre-

More information

Add 2016 GBS Poster As Slide One

Add 2016 GBS Poster As Slide One Add 2016 GBS Poster As Slide One GBS Adapters and Enzymes Barcode Adapter P1 Sticky Ends Common Adapter P2 Illumina Sequencing Primer 2 Barcode (4 8 bp) Restriction Enzymes Illumina Sequencing Primer 1

More information

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park Next Generation Sequences & Chloroplast Assembly 8 June, 2012 Jongsun Park Table of Contents 1 History of Sequencing Technologies 2 Genome Assembly Processes With NGS Sequences 3 How to Assembly Chloroplast

More information

Supporting Information

Supporting Information Supporting Information Poelman et al. 10.1073/pnas.1110748108 SI Text Cloning of Transcription Factor BoMYC as an Example in the Study. A partial sequence (575 bp) of the cabbage transcription factor (MYC)

More information

OECD O E C 52 Vol.6_No.02

OECD O E C 52 Vol.6_No.02 OECD O E C 52 Vol.6_No.02 BIOSAFETY 53 54 Vol.6_No.02 BIOSAFETY 55 56 Vol.6_No.02 BIOSAFETY 57 58 Vol.6_No.02 BIOSAFETY 59 1) Recombinant DNA Safety Considerations. Safety considerations for industrial,

More information

De Novo Assembly (Pseudomonas aeruginosa MAPO1 ) Sample to Insight

De Novo Assembly (Pseudomonas aeruginosa MAPO1 ) Sample to Insight De Novo Assembly (Pseudomonas aeruginosa MAPO1 ) Sample to Insight 1 Workflow Import NGS raw data QC on reads De novo assembly Trim reads Finding Genes BLAST Sample to Insight Case Study Pseudomonas aeruginosa

More information

Parts of a standard FastQC report

Parts of a standard FastQC report FastQC FastQC, written by Simon Andrews of Babraham Bioinformatics, is a very popular tool used to provide an overview of basic quality control metrics for raw next generation sequencing data. There are

More information

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased exponentially since the 1990s. In 2005, with the introduction

More information

Nature Biotechnology: doi: /nbt.1693

Nature Biotechnology: doi: /nbt.1693 b a c Supplementary Figure 1. Isolation of cambium cell layer from xylem tissue. a, Preparation of T. cuspidata explant by peeling off cambium, phloem, cortex, and epidermal cells from the xylem. Given

More information

Conifer Translational Genomics Network Coordinated Agricultural Project

Conifer Translational Genomics Network Coordinated Agricultural Project Conifer Translational Genomics Network Coordinated Agricultural Project Genomics in Tree Breeding and Forest Ecosystem Management ----- Module 2 Genes, Genomes, and Mendel Nicholas Wheeler & David Harry

More information

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454

More information

Current'Advances'in'Sequencing' Technology' James'Gurtowski' Schatz'Lab'

Current'Advances'in'Sequencing' Technology' James'Gurtowski' Schatz'Lab' Current'Advances'in'Sequencing' Technology' James'Gurtowski' Schatz'Lab' Outline' 1. Assembly'Review' 2. Pacbio' Technology'Overview' Data'CharacterisFcs' Algorithms' Results' 'Assemblies' 3. Oxford'Nanopore'

More information

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé RADSeq Data Analysis Through STACKS on Galaxy Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé RAD sequencing: next-generation tools for an old problem INTRODUCTION source: Karim Gharbi

More information

Genome sequencing in Senecio squalidus

Genome sequencing in Senecio squalidus Genome sequencing in Senecio squalidus Outline of project A new NERC funded grant, the genomic basis of adaptation and species divergence in Senecio in collaboration with Richard Abbott and Dmitry Filatov

More information

Wet-lab Considerations for Illumina data analysis

Wet-lab Considerations for Illumina data analysis Wet-lab Considerations for Illumina data analysis Based on a presentation by Henriette O Geen Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center Complementary Approaches Illumina

More information

Next-generation sequencing technologies

Next-generation sequencing technologies Next-generation sequencing technologies Illumina: Summary https://www.youtube.com/watch?v=fcd6b5hraz8 Illumina platforms: Benchtop sequencers https://www.illumina.com/systems/sequencing-platforms.html

More information

Gap Filling for a Human MHC Haplotype Sequence

Gap Filling for a Human MHC Haplotype Sequence American Journal of Life Sciences 2016; 4(6): 146-151 http://www.sciencepublishinggroup.com/j/ajls doi: 10.11648/j.ajls.20160406.12 ISSN: 2328-5702 (Print); ISSN: 2328-5737 (Online) Gap Filling for a Human

More information

Lieberman-Aiden et al. (2009) Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326:

Lieberman-Aiden et al. (2009) Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326: Lieberman-Aiden et al. (2009) Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326: 289-293. : Understanding the 3D conformation of the genome can

More information

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph Introduction to Plant Genomics and Online Resources Manish Raizada University of Guelph Genomics Glossary http://www.genomenewsnetwork.org/articles/06_00/sequence_primer.shtml Annotation Adding pertinent

More information

Analytics Behind Genomic Testing

Analytics Behind Genomic Testing A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical

More information

Lecture 7. Next-generation sequencing technologies

Lecture 7. Next-generation sequencing technologies Lecture 7 Next-generation sequencing technologies Next-generation sequencing technologies General principles of short-read NGS Construct a library of fragments Generate clonal template populations Massively

More information

Supplementary Information. Supplementary Figure S1. Phenotypic comparison of the wild type and mutants.

Supplementary Information. Supplementary Figure S1. Phenotypic comparison of the wild type and mutants. Supplementary Information Supplementary Figure S1. Phenotypic comparison of the wild type and mutants. Supplementary Figure S2. Transverse sections of anthers. Supplementary Figure S3. DAPI staining and

More information

De novo Genome Assembly

De novo Genome Assembly De novo Genome Assembly A/Prof Torsten Seemann Winter School in Mathematical & Computational Biology - Brisbane, AU - 3 July 2017 Introduction The human genome has 47 pieces MT (or XY) The shortest piece

More information

Consensus Ensemble Approaches Improve De Novo Transcriptome Assemblies

Consensus Ensemble Approaches Improve De Novo Transcriptome Assemblies University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department

More information

Contact us for more information and a quotation

Contact us for more information and a quotation GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA

More information

Massive Analysis of cdna Ends for simultaneous Genotyping and Transcription Profiling in High Throughput

Massive Analysis of cdna Ends for simultaneous Genotyping and Transcription Profiling in High Throughput Next Generation (Sequencing) Tools for Nucleotide-Based Information Massive Analysis of cdna Ends for simultaneous Genotyping and Transcription Profiling in High Throughput Björn Rotter, PhD GenXPro GmbH,

More information

Anker P Sørensen Crop innovation through novel NGS applications

Anker P Sørensen Crop innovation through novel NGS applications nker P Sørensen Crop innovation through novel NGS applications Session 1: NGS / Omic Technologies for Plant Research 19 e March 2015, Kuala Lumpur Malaysia KeyGene s corporate profile Wageningen NL Rockville

More information

UC Davis UC Davis Previously Published Works

UC Davis UC Davis Previously Published Works UC Davis UC Davis Previously Published Works Title Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome

More information

Mapping strategies for sequence reads

Mapping strategies for sequence reads Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements

More information

RNA-Seq analysis workshop

RNA-Seq analysis workshop RNA-Seq analysis workshop Zhangjun Fei Boyce Thompson Institute for Plant Research USDA Robert W. Holley Center for Agriculture and Health Cornell University Outline Background of RNA-Seq Application of

More information

APP Auckland University of Technology (AUT)

APP Auckland University of Technology (AUT) DECISION Date 21 December 2012 Application code Application type Applicant APP201149 To develop in containment genetically modified organisms under sections 40(1) and 42A of the Hazardous Substances and

More information

GENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON

GENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON GENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON Matthew Baranski, Casey Jowdy, Hooman Moghadam, Ashie Norris, Håvard Bakke, Anna Sonesson,

More information

Genome Sequence of Medicago sativa: Cultivated Alfalfa at the Diploid Level (CADL)

Genome Sequence of Medicago sativa: Cultivated Alfalfa at the Diploid Level (CADL) Genome Sequence of Medicago sativa: Cultivated Alfalfa at the Diploid Level (CADL) Maria J. Monteros, Joann Mudge, Andrew D. Farmer, Nicholas P. Devitt, Diego A. Fajardo, Thiru Ramaraj, Xinbin Dai, Zhaohong

More information

DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN

DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN ... 2014 2015 2016 2017 ... 2014 2015 2016 2017 Synthetic

More information

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center High Throughput Sequencing the Multi-Tool of Life Sciences Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center DNA Technologies & Expression Analysis Cores HT Sequencing (Illumina

More information

It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change

It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change Generation of transcriptome resources in rubber in response to Corynespora cassiicola causing Corynespora leaf disease for gene discovery and marker identification using NGS platform C. Bindu Roy and T.

More information

Finishing of DFIC This project sought to finish DFIC , the terminal 45 kb of the Drosophila

Finishing of DFIC This project sought to finish DFIC , the terminal 45 kb of the Drosophila Lin 1 Kevin Lin Bio 434W Dr. Elgin 26 February 2016 Finishing of DFIC6622001 Abstract This project sought to finish DFIC6622001, the terminal 45 kb of the Drosophila ficusphila dot chromosome. The initial

More information

The tomato genome re-seq project

The tomato genome re-seq project The tomato genome re-seq project http://www.tomatogenome.net 5 February 2013, Richard Finkers & Sjaak van Heusden Rationale Genetic diversity in commercial tomato germplasm relatively narrow Unexploited

More information

Wheat CAP Gene Expression with RNA-Seq

Wheat CAP Gene Expression with RNA-Seq Wheat CAP Gene Expression with RNA-Seq July 9 th -13 th, 2018 Overview of the workshop, Alina Akhunova http://www.ksre.k-state.edu/igenomics/workshops/ RNA-Seq Workshop Activities Lectures Laboratory Molecular

More information

Technology (KACST) and Chinese Academy of Sciences (CAS), Prince Turki Road, Riyadh 11442, Kingdom of Saudi Arabia.

Technology (KACST) and Chinese Academy of Sciences (CAS), Prince Turki Road, Riyadh 11442, Kingdom of Saudi Arabia. Supplementary Information for Genome Sequences of the Date Palm Phoenix dactylifera L. Ibrahim S. Al-Mssallem 1,3,4, Songnian Hu 1,2,4, Xiaowei Zhang 1,2,4, Qiang Lin 1,2,4, Wanfei Liu 1,2,4, Jun Tan 1,4,

More information