Genomics and Transcriptomics of Spirodela polyrhiza

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Genomics and Transcriptomics of Spirodela polyrhiza"

Transcription

1 Genomics and Transcriptomics of Spirodela polyrhiza Doug Bryant Bioinformatics Core Facility & Todd Mockler Group, Donald Danforth Plant Science Center

2 Desired Outcomes High-quality genomic reference sequence Transcriptome definition, functional annotation Comparison of several additional accessions

3 Genomic, RNA-Seq Data Spirodela accession 9509 deeply sequenced Additional 8 accessions, low coverage RNA-seq obtained from 9509 and two other accessions under Control and ABA conditions Kuehdorf, Jetschke, Ballani, and Appenroth 2013

4 Analysis Strategy Genome data acquisition Transcriptome data acquisition Data quality control Genome, transcriptome assembly Genome structural annotation Transcriptome functional annotation Differential expression analysis*

5 Genome

6 Data Acquisition Genomic Illumina HiSeq Diverse library set Overlap bp Several mate-pair Illumina HiSeq 2000

7 Quality Control Raw Data Visualize Adaptors Verify Insert Sizes Retain Pairs Only Trim 3 Low Quality Insert Size Stdev Read Length Trimmed Avg Read Length Read Pairs Passed QC 329 Mbp ,630, ,683, ,684, ,328, ,172, ,051, ,983, ,782, ,316,

8 Genomic Data Insert Sizes Distribution, ,000 bp, 26, 23% 180 bp, 31, 28% 5,000 bp, 11, 10% 2,000 bp, 20, 17% 500 bp, 25, 22% Insert size, estimated coverage, fraction of total data.

9 Genome Assembly Several iterations Preliminary assemblies with Velvet, SOAPdenovo Final assembly with AllPathsLG Polished with SSPACE

10 Genome Assembly Statistics Assembly 9509 (Mbp) (152 exp.) 146 (96% of exp.) Scaffolds (#) 774 Scaffolds >= 1 Mbp (#) 32 (4.13%) N50 scaffold length (bp) 4,305,909 L50 scaffold (#) 11 N90 scaffold length (bp) 1,428,181 L90 scaffold (#) 31 Ns (%) 7.7

11 Genomic Physical Coverage Physical Coverage by Library (Total: 370x) Coverage (x) bp 500bp 2,000bp 5,000bp 20,000bp

12 Genome Assembly Quality Assessment Reads used in assembly? Reads align to assembly? Core eukaryotic genes present?

13 Genomic Reads Used, Aligned 180bp 500bp 2,000bp 5,000bp 20,000bp 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Reads Used (%) Reads Align (%)

14 Core Eukaryotic Genes Core Eukaryotic Genes Mapping Approach (CEGMA), Korf lab (korflab.ucdavis.edu/) Search genome for 248 low copy, highly conserved genes Assess completeness of genome

15 Core Eukaryotic Genes Number of Core Genes Identified A.thaliana (99.60%) Complete B.distachyon (99.19%) Partial Z.mays (97.18%) Species (core genes at least partial %) S.polyrhiza (97.18%)

16 Resequencing

17 Resequencing Kuehdorf, Jetschke, Ballani, and Appenroth 2013

18 Resequencing Kuehdorf, Jetschke, Ballani, and Appenroth, 2013

19 Resequencing Data 120 Sequencing Depth of Coverage Coverage Depth Strain

20 Resequencing Variation SNP/INDEL Rate Per Accession 600,000 SNP Positions INDEL Positions 500,000 Num. Positions 400, , , , (0.12%) 9504 (0.39%) 9506 (0.35%) 9316 (0.37%) 9242 (0.37%) 9502 (0.20%) 9511 (0.21%) 9512 (0.29%) 9501 (0.20%) Accession (Total Variant Positions %)

21 Resequencing Assemblies Per accession: ~30x coverage, single library Assembled each using Velvet Mean assembled size: 128 Mbp (~84%) (stdev: 6 Mbp) Mean N50: 15kb (stdev: 1.5kb) Nearly all contigs (>98%) align to 9509 genome assembly Defining structural differences in progress

22 Transcriptome

23 RNA-Seq Data Kuehdorf, Jetschke, Ballani, and Appenroth, 2013

24 RNA-Seq Data Kuehdorf, Jetschke, Ballani, and Appenroth, 2013

25 RNA-Seq Data 250 RNA-Seq Reads per Accession and Treatment No. 101 bp Reads (M) Control 9509 ABA 9316 Control 9316 ABA 9501 Control 9501 ABA

26 Transcriptome Discovery 1. Reference-guided assembly Tophat2 Cufflinks2 2. De novo predictions Maker, informed by assembly SNAP, Augustus, GeneMarkHMM Iteratively trained SNAP

27 (1) Reference-Guided Transcriptome Assembly Align each RNA-seq library (6) to genome For each, define transcripts based on alignments Merge resulting assemblies to discover gene models, alternative splicing Output: Gene, transcripts annotation (GFF3) Transcripts (FASTA)

28 (2) De novo Transcriptome Discovery Discover genes not expressed in RNA-seq experiments Train algorithms on reference-guided assembly 1. Call high-confidence open reading frames in transcript sequences 2. Use transcripts and translated proteins to inform and train de novo gene callers 3. Iteratively train SNAP on resulting output

29 (2) De novo Transcriptome Assembled: 25,090 loci 41,884 transcripts Discovery Of 41,884 transcripts, complete ORF and at least 33 amino acids: 39,076 Initial training using these transcripts and proteins

30 Transcriptome Discovery: Results Preliminary maker output: 28,600 genes Prune: Must have RNA-seq evidence across >= 50% or, >= 100 amino acids with complete ORF Prune bacterial scaffolds Final gene set: 23,495 genes Transcriptome size (nucleic acids): 33 Mbp Mean protein length: 358 amino acids 19,380 (82%) have functional prediction from BLASTP and/or InterProScan

31 Transcriptome Functional Annotation BlastP (77%) 1, InterProScan (66%) ,066 2, ,238 (89%) RNA-Seq Evidence 877 (3.7%)

32 Transcriptome Annotation Brachypodium distachyon Sorghum bicolor Cicer arietinum Setaria italica Solanum lycopersicum Fragaria vesca Zea mays Cucumis sativus Ricinus communis Glycine max Prunus persica Populus trichocarpa Oryza sativa Theobroma cacao Vitis vinifera Annotations by Species

33 Alternative Splicing Genes With Num. Isoforms Num. Genes with Num. Isoforms (log 10) Num. Isoforms

34 Identify Differentially Expressed Genes 250 RNA-Seq Reads per Accession and Treatment No. 101 bp Reads (M) Control 9509 ABA 9316 Control 9316 ABA 9501 Control 9501 ABA

35 Differentially Expressed Genes 1,727 genes identified as significantly differentially expressed 1,105 isoforms identified as significant Molecular verification in progress

36 Ongoing Assemble repetitive elements Assemble, annotate mitochondria, chloroplast Accessions, structural differences Molecular investigation of differentially expressed genes of interest

37 Thank you

How much sequencing do I need? Emily Crisovan Genomics Core

How much sequencing do I need? Emily Crisovan Genomics Core How much sequencing do I need? Emily Crisovan Genomics Core How much sequencing? Three questions: 1. How much sequence is required for good experimental design? 2. What type of sequencing run is best?

More information

Data Analysis with CASAVA v1.8 and the MiSeq Reporter

Data Analysis with CASAVA v1.8 and the MiSeq Reporter Data Analysis with CASAVA v1.8 and the MiSeq Reporter Eric Smith, PhD Bioinformatics Scientist September 15 th, 2011 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense

More information

RNA-Sequencing analysis

RNA-Sequencing analysis RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges

More information

Mapping strategies for sequence reads

Mapping strategies for sequence reads Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements

More information

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro

Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Sequencing the genomes of Nicotiana sylvestris and Nicotiana tomentosiformis Nicolas Sierro Philip Morris International R&D, Philip Morris Products S.A., Neuchatel, Switzerland Introduction Nicotiana sylvestris

More information

RNASEQ WITHOUT A REFERENCE

RNASEQ WITHOUT A REFERENCE RNASEQ WITHOUT A REFERENCE Experimental Design Assembly in Non-Model Organisms And other (hopefully useful) Stuff Meg Staton mstaton1@utk.edu University of Tennessee Knoxville, TN I. Project Design Things

More information

RNA-Seq with the Tuxedo Suite

RNA-Seq with the Tuxedo Suite RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop The Basic Tuxedo Suite References Trapnell C, et al. 2009 TopHat: discovering splice junctions with

More information

Mate-pair library data improves genome assembly

Mate-pair library data improves genome assembly De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate

More information

Genome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does

More information

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Sequence Assembly and Alignment. Jim Noonan Department of Genetics Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome

More information

RNA-Seq Software, Tools, and Workflows

RNA-Seq Software, Tools, and Workflows RNA-Seq Software, Tools, and Workflows Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 1, 2016 Some mrna-seq Applications Differential gene expression analysis Transcriptional profiling Assumption:

More information

Introduction to RNA sequencing

Introduction to RNA sequencing Introduction to RNA sequencing Bioinformatics perspective Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden November 2017 Olga (NBIS) RNA-seq November 2017 1 / 49 Outline Why sequence

More information

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements

More information

De Novo Assembly of High-throughput Short Read Sequences

De Novo Assembly of High-throughput Short Read Sequences De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,

More information

Assessing De-Novo Transcriptome Assemblies

Assessing De-Novo Transcriptome Assemblies Assessing De-Novo Transcriptome Assemblies Shawn T. O Neil Center for Genome Research and Biocomputing Oregon State University Scott J. Emrich University of Notre Dame 100K Contigs, Perfect 1M Contigs,

More information

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist Whole Transcriptome Analysis of Illumina RNA- Seq Data Ryan Peters Field Application Specialist Partek GS in your NGS Pipeline Your Start-to-Finish Solution for Analysis of Next Generation Sequencing Data

More information

De novo genome assembly with next generation sequencing data!! "

De novo genome assembly with next generation sequencing data!! De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature

More information

Long and short/small RNA-seq data analysis

Long and short/small RNA-seq data analysis Long and short/small RNA-seq data analysis GEF5, 4.9.2015 Sami Heikkinen, PhD, Dos. Topics 1. RNA-seq in a nutshell 2. Long vs short/small RNA-seq 3. Bioinformatic analysis work flows GEF5 / Heikkinen

More information

RNA Sequencing Analyses & Mapping Uncertainty

RNA Sequencing Analyses & Mapping Uncertainty RNA Sequencing Analyses & Mapping Uncertainty Adam McDermaid 1/26 RNA-seq Pipelines Collection of tools for analyzing raw RNA-seq data Tier 1 Quality Check Data Trimming Tier 2 Read Alignment Assembly

More information

Gap Filling for a Human MHC Haplotype Sequence

Gap Filling for a Human MHC Haplotype Sequence American Journal of Life Sciences 2016; 4(6): 146-151 http://www.sciencepublishinggroup.com/j/ajls doi: 10.11648/j.ajls.20160406.12 ISSN: 2328-5702 (Print); ISSN: 2328-5737 (Online) Gap Filling for a Human

More information

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter A shotgun introduction to sequence assembly (with Velvet) MCB 247 - Brem, Eisen and Pachter Hot off the press January 27, 2009 06:00 AM Eastern Time llumina Launches Suite of Next-Generation Sequencing

More information

Assembly of Ariolimax dolichophallus using SOAPdenovo2

Assembly of Ariolimax dolichophallus using SOAPdenovo2 Assembly of Ariolimax dolichophallus using SOAPdenovo2 Charles Markello, Thomas Matthew, and Nedda Saremi Image taken from Banana Slug Genome Project, S. Weber SOAPdenovo Assembly Tool Short Oligonucleotide

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Richard Corbett Canada s Michael Smith Genome Sciences Centre Vancouver, British Columbia June 28, 2017 Our mandate is to advance knowledge about cancer and other diseases

More information

Sanger vs Next-Gen Sequencing

Sanger vs Next-Gen Sequencing Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics

More information

Introduction to RNA-Seq

Introduction to RNA-Seq Introduction to RNA-Seq Monica Britton, Ph.D. Sr. Bioinformatics Analyst March 2015 Workshop Overview of RNA-Seq Activities RNA-Seq Concepts, Terminology, and Work Flows Using Single-End Reads and a Reference

More information

Workflow of de novo assembly

Workflow of de novo assembly Workflow of de novo assembly Experimental Design Clean sequencing data (trim adapter and low quality sequences) Run assembly software for contiging and scaffolding Evaluation of assembly Several iterations:

More information

Analysing genomes and transcriptomes using Illumina sequencing

Analysing genomes and transcriptomes using Illumina sequencing Analysing genomes and transcriptomes using Illumina uencing Dr. Heinz Himmelbauer Centre for Genomic Regulation (CRG) Ultrauencing Unit Barcelona The Sequencing Revolution High-Throughput Sequencing 2000

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

RNAseq Differential Gene Expression Analysis Report

RNAseq Differential Gene Expression Analysis Report RNAseq Differential Gene Expression Analysis Report Customer Name: Institute/Company: Project: NGS Data: Bioinformatics Service: IlluminaHiSeq2500 2x126bp PE Differential gene expression analysis Sample

More information

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018 Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT

More information

Title: High-quality genome assembly of channel catfish, Ictalurus punctatus

Title: High-quality genome assembly of channel catfish, Ictalurus punctatus Author s response to reviews Title: High-quality genome assembly of channel catfish, Ictalurus punctatus Authors: Qiong Shi (shiqiong@genomics.cn) Xiaohui Chen (xhchenffri@hotmail.com) Liqiang Zhong (lqzhongffri@hotmail.com)

More information

Shuji Shigenobu. April 3, 2013 Illumina Webinar Series

Shuji Shigenobu. April 3, 2013 Illumina Webinar Series Shuji Shigenobu April 3, 2013 Illumina Webinar Series RNA-seq RNA-seq is a revolutionary tool for transcriptomics using deepsequencing technologies. genome HiSeq2000@NIBB (Wang 2009 with modifications)

More information

Genome Sequence of Medicago sativa: Cultivated Alfalfa at the Diploid Level (CADL)

Genome Sequence of Medicago sativa: Cultivated Alfalfa at the Diploid Level (CADL) Genome Sequence of Medicago sativa: Cultivated Alfalfa at the Diploid Level (CADL) Maria J. Monteros, Joann Mudge, Andrew D. Farmer, Nicholas P. Devitt, Diego A. Fajardo, Thiru Ramaraj, Xinbin Dai, Zhaohong

More information

Bioinformatics Advice on Experimental Design

Bioinformatics Advice on Experimental Design Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics

More information

Complementary Technologies for Precision Genetic Analysis

Complementary Technologies for Precision Genetic Analysis Complementary NGS, CGH and Workflow Featured Publication Zhu, J. et al. Duplication of C7orf58, WNT16 and FAM3C in an obese female with a t(7;22)(q32.1;q11.2) chromosomal translocation and clinical features

More information

RNA-seq Data Analysis

RNA-seq Data Analysis Lecture 3. Clustering; Function/Pathway Enrichment analysis RNA-seq Data Analysis Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Lecture 1. Map RNA-seq read to genome Lecture

More information

What the Genome of Raffaelea lauricola Can Tell Us About Laurel Wilt

What the Genome of Raffaelea lauricola Can Tell Us About Laurel Wilt What the Genome of Raffaelea lauricola Can Tell Us About Laurel Wilt Laurel Wilt Summit November 3-4, 2016 Dr. Jeffrey Rollins Associate Professor Plant Pathology Department University of Florida Gainesville,

More information

Single Cell Genomics

Single Cell Genomics Single Cell Genomics Application Cost Platform/Protoc ol Note Single cell 3 mrna-seq cell lysis/rt/library prep $2460/Sample 10X Genomics Chromium 500-10,000 cells/sample Single cell 5 V(D)J mrna-seq cell

More information

NOW GENERATION SEQUENCING. Monday, December 5, 11

NOW GENERATION SEQUENCING. Monday, December 5, 11 NOW GENERATION SEQUENCING 1 SEQUENCING TIMELINE 1953: Structure of DNA 1975: Sanger method for sequencing 1985: Human Genome Sequencing Project begins 1990s: Clinical sequencing begins 1998: NHGRI $1000

More information

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia RNA-Seq Workshop AChemS 2017 Sunil K Sukumaran Monell Chemical Senses Center Philadelphia Benefits & downsides of RNA-Seq Benefits: High resolution, sensitivity and large dynamic range Independent of prior

More information

Local assembly and pre-mrna splicing analyses by high-throughput sequencing data

Local assembly and pre-mrna splicing analyses by high-throughput sequencing data Graduate Theses and Dissertations Graduate College 2012 Local assembly and pre-mrna splicing analyses by high-throughput sequencing data Hsien-chao Chou Iowa State University Follow this and additional

More information

SCIENCE CHINA Life Sciences

SCIENCE CHINA Life Sciences SCIENCE CHINA Life Sciences SPECIAL TOPIC February 2013 Vol.56 No.2: 143 155 RESEARCH PAPER doi: 10.1007/s11427-013-4442-z Comparative study of de novo assembly and genome-guided assembly strategies for

More information

Next Gen Sequencing. Expansion of sequencing technology. Contents

Next Gen Sequencing. Expansion of sequencing technology. Contents Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND

More information

Genome Assembly. J Fass UCD Genome Center Bioinformatics Core Friday September, 2015

Genome Assembly. J Fass UCD Genome Center Bioinformatics Core Friday September, 2015 Genome Assembly J Fass UCD Genome Center Bioinformatics Core Friday September, 2015 From reads to molecules What s the Problem? How to get the best assemblies for the smallest expense (sequencing) and

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl Next Generation Sequencing Bioinformatics small variants Data Analysis Guidelines genomescan.nl GenomeScan s Guidelines for Small Variant Analysis on NGS Data Using our own proprietary data analysis pipelines

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of

More information

Supplementary Figure 1. The tree of Chinese jujube and its growing environment. The jujube has a very long lifecycle, even more than 1000 productive

Supplementary Figure 1. The tree of Chinese jujube and its growing environment. The jujube has a very long lifecycle, even more than 1000 productive Supplementary Figure 1. The tree of Chinese jujube and its growing environment. The jujube has a very long lifecycle, even more than 1000 productive years. It is well adapted to drought and salinity. Supplementary

More information

COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly

COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly Bioinformatics Advance Access published October 8, 2012 COPE: An accurate k-mer based pair-end reads connection tool to facilitate genome assembly Binghang Liu 1,2,, Jianying Yuan 2,, Siu-Ming Yiu 1,3,

More information

UC Davis UC Davis Previously Published Works

UC Davis UC Davis Previously Published Works UC Davis UC Davis Previously Published Works Title Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome

More information

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1 BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to

More information

Course Presentation. Ignacio Medina Presentation

Course Presentation. Ignacio Medina Presentation Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital

More information

Chimp Sequence Annotation: Region 2_3

Chimp Sequence Annotation: Region 2_3 Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker

More information

Jenny Gu, PhD Strategic Business Development Manager, PacBio

Jenny Gu, PhD Strategic Business Development Manager, PacBio IDT and PacBio joint presentation Characterizing Alzheimer s Disease candidate genes and transcripts with targeted, long-read, single-molecule sequencing Jenny Gu, PhD Strategic Business Development Manager,

More information

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd 1 Our current NGS & Bioinformatics Platform 2 Our NGS workflow and applications 3 QIAGEN s

More information

Next-Generation Sequencing. Technologies

Next-Generation Sequencing. Technologies Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062

More information

De novo Genome Assembly

De novo Genome Assembly De novo Genome Assembly A/Prof Torsten Seemann Winter School in Mathematical & Computational Biology - Brisbane, AU - 3 July 2017 Introduction The human genome has 47 pieces MT (or XY) The shortest piece

More information

Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions

Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions Outline Introduction to ab initio and evidence-based gene finding Overview of computational gene predictions Different types of eukaryotic gene predictors Common types of gene prediction errors Wilson

More information

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome Lectures 30 and 31 Genome analysis I. Genome analysis A. two general areas 1. structural 2. functional B. genome projects a status report 1. 1 st sequenced: several viral genomes 2. mitochondria and chloroplasts

More information

measuring gene expression December 5, 2017

measuring gene expression December 5, 2017 measuring gene expression December 5, 2017 transcription a usually short-lived RNA copy of the DNA is created through transcription RNA is exported to the cytoplasm to encode proteins some types of RNA

More information

Welcome to the NGS webinar series

Welcome to the NGS webinar series Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic

More information

A Roadmap to the De-novo Assembly of the Banana Slug Genome

A Roadmap to the De-novo Assembly of the Banana Slug Genome A Roadmap to the De-novo Assembly of the Banana Slug Genome Stefan Prost 1 1 Department of Integrative Biology, University of California, Berkeley, United States of America April 6th-10th, 2015 Outline

More information

CHAPTER 21 LECTURE SLIDES

CHAPTER 21 LECTURE SLIDES CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.

More information

ABSTRACT COMPUTATIONAL METHODS TO IMPROVE GENOME ASSEMBLY AND GENE PREDICTION. David Kelley, Doctor of Philosophy, 2011

ABSTRACT COMPUTATIONAL METHODS TO IMPROVE GENOME ASSEMBLY AND GENE PREDICTION. David Kelley, Doctor of Philosophy, 2011 ABSTRACT Title of dissertation: COMPUTATIONAL METHODS TO IMPROVE GENOME ASSEMBLY AND GENE PREDICTION David Kelley, Doctor of Philosophy, 2011 Dissertation directed by: Professor Steven Salzberg Department

More information

Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition

Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition RESEARCH ARTICLE Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition Moses M. Muraya 1,2, Thomas Schmutzer 1 *, Chris Ulpinnis

More information

De novo whole genome assembly

De novo whole genome assembly De novo whole genome assembly Lecture 1 Qi Sun Bioinformatics Facility Cornell University Data generation Sequencing Platforms Short reads: Illumina Long reads: PacBio; Oxford Nanopore Contiging/Scaffolding

More information

Ion S5 and Ion S5 XL Systems

Ion S5 and Ion S5 XL Systems Ion S5 and Ion S5 XL Systems Targeted sequencing has never been simpler Explore the Ion S5 and Ion S5 XL Systems Adopting next-generation sequencing (NGS) in your lab is now simpler than ever The Ion S5

More information

Introduction to NGS Technologies

Introduction to NGS Technologies Introduction to NGS Technologies Ignacio Medina imedina@ebi.ac.uk Project Manager & Senior Software Engineer at EBI Variation European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory

More information

CNV and variant detection for human genome resequencing data - for biomedical researchers (II)

CNV and variant detection for human genome resequencing data - for biomedical researchers (II) CNV and variant detection for human genome resequencing data - for biomedical researchers (II) Chuan-Kun Liu 劉傳崑 Senior Maneger National Center for Genome Medican bioit@ncgm.sinica.edu.tw Abstract Common

More information

Index. E Electrophoretic Mobility Shift Assay (EMSA), 262 ENCODE project, 223, 224 European Nucleotide Archive (ENA), 34

Index. E Electrophoretic Mobility Shift Assay (EMSA), 262 ENCODE project, 223, 224 European Nucleotide Archive (ENA), 34 A Alternative splicing computational analysis, 114 data processing, 106 experimental design, 114 isoform quantification AltAnalyze, 109 CuffDiff, 110 DEXSeq, 108 DiffSplice, 109 exon/transcript isoform,

More information

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech GALAXY INITIATION A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech How does Next- Gen sequencing work? DNA fragmentation Size selection and clonal amplification Massive parallel sequencing ACCGTTTGCCG

More information

De novo whole genome assembly

De novo whole genome assembly De novo whole genome assembly Lecture 1 Qi Sun Minghui Wang Bioinformatics Facility Cornell University DNA Sequencing Platforms Illumina sequencing (100 to 300 bp reads) Overlapping reads ~180bp fragment

More information

RNA-Seq Tutorial 1. Kevin Silverstein, Ying Zhang Research Informatics Solutions, MSI October 18, 2016

RNA-Seq Tutorial 1. Kevin Silverstein, Ying Zhang Research Informatics Solutions, MSI October 18, 2016 RNA-Seq Tutorial 1 Kevin Silverstein, Ying Zhang Research Informatics Solutions, MSI October 18, 2016 Slides available at www.msi.umn.edu/tutorial-materials RNA-Seq Tutorials Lectures RNA-Seq experiment

More information

Introductory Next Gen Workshop

Introductory Next Gen Workshop Introductory Next Gen Workshop http://www.illumina.ucr.edu/ http://www.genomics.ucr.edu/ Workshop Objectives Workshop aimed at those who are new to Illumina sequencing and will provide: - a basic overview

More information

High-throughput scale. Desktop simplicity.

High-throughput scale. Desktop simplicity. High-throughput scale. Desktop simplicity. NextSeq 500 System. Flexible power. Speed and simplicity for whole-genome, exome, and transcriptome sequencing. Harness the power of next-generation sequencing.

More information

Whole Transcriptome Sequencing/RNA-Seq

Whole Transcriptome Sequencing/RNA-Seq Whole Transcriptome Sequencing/RNA-Seq RNA Seq refers to the use of high throughput next genera on sequencing technologies to sequence complementary DNA (cdna) sequences. Successful whole transcriptome

More information

Corset: enabling differential gene expression analysis for de novo assembled transcriptomes

Corset: enabling differential gene expression analysis for de novo assembled transcriptomes Davidson and Oshlack Genome Biology 2014, 15:410 METHOD Open Access : enabling differential gene expression analysis for de novo assembled transcriptomes Nadia M Davidson 1 and Alicia Oshlack 1,2* Abstract

More information

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias Genome Sequencing I: Methods MMG 835, SPRING 2016 Eukaryotic Molecular Genetics George I. Mias Department of Biochemistry and Molecular Biology gmias@msu.edu Sequencing Methods Cost of Sequencing Wetterstrand

More information

Illumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme

Illumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme Illumina (Solexa) Current market leader Based on sequencing by synthesis Current read length 100-150bp Paired-end easy, longer matepairs harder Error ~0.1% Mismatch errors dominate Throughput: 4 Tbp in

More information

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned

More information

A near perfect de novo assembly of a eukaryotic genome using sequence reads of greater than 10 kilobases generated by the Pacific Biosciences RS II

A near perfect de novo assembly of a eukaryotic genome using sequence reads of greater than 10 kilobases generated by the Pacific Biosciences RS II A near perfect de novo assembly of a eukaryotic genome using sequence reads of greater than 10 kilobases generated by the Pacific Biosciences RS II W. Richard McCombie Disclosures Introduction to the challenge

More information

Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction

Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction Gunnar Rätsch Friedrich Miescher Laboratory Max Planck Society, Tübingen, Germany NGS Bioinformatics Meeting, Paris (March 24, 2010)

More information

The tomato genome re-seq project

The tomato genome re-seq project The tomato genome re-seq project http://www.tomatogenome.net 5 February 2013, Richard Finkers & Sjaak van Heusden Rationale Genetic diversity in commercial tomato germplasm relatively narrow Unexploited

More information

Supplementary Information. The genome of Prunus mume. Inventory of Supplementary Information: Supplementary Figures S1-S9. Supplementary Tables S1-S22

Supplementary Information. The genome of Prunus mume. Inventory of Supplementary Information: Supplementary Figures S1-S9. Supplementary Tables S1-S22 Supplementary Information The genome of Prunus mume Qixiang Zhang 1,6,*, Wenbin Chen 2,6, Lidan Sun 1,6, Fangying Zhao 3,6, Bangqing Huang 2,6, Weiru Yang 1, Ye Tao 2, Jia Wang 4, Zhiqiong Yuan 3, Guangyi

More information

Measuring transcriptomes with RNA-Seq. BMI/CS 776 Spring 2016 Anthony Gitter

Measuring transcriptomes with RNA-Seq. BMI/CS 776  Spring 2016 Anthony Gitter Measuring transcriptomes with RNA-Seq BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu Overview RNA-Seq technology The RNA-Seq quantification problem Generative

More information

Hybrid Error Correction and De Novo Assembly with Oxford Nanopore

Hybrid Error Correction and De Novo Assembly with Oxford Nanopore Hybrid Error Correction and De Novo Assembly with Oxford Nanopore Michael Schatz Jan 13, 2015 PAG Bioinformatics @mike_schatz / #PAGXXIII Oxford Nanopore MinION Thumb drive sized sequencer powered over

More information

Next Generation Sequencing: An Overview

Next Generation Sequencing: An Overview Next Generation Sequencing: An Overview Cavan Reilly November 13, 2017 Table of contents Next generation sequencing NGS and microarrays Study design Quality assessment Burrows Wheeler transform Next generation

More information

Lees J.A., Vehkala M. et al., 2016 In Review

Lees J.A., Vehkala M. et al., 2016 In Review Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes Lees J.A., Vehkala M. et al., 2016 In Review Journal Club Triinu Kõressaar 16.03.2016 Introduction Bacterial

More information

Towards detection of minimal residual disease in multiple myeloma through circulating tumour DNA sequence analysis

Towards detection of minimal residual disease in multiple myeloma through circulating tumour DNA sequence analysis Towards detection of minimal residual disease in multiple myeloma through circulating tumour DNA sequence analysis Trevor Pugh, PhD, FACMG Princess Margaret Cancer Centre, University Health Network Dept.

More information

High-specificity detection of rare alleles with Paired-End Low Error Sequencing (PELE-Seq)

High-specificity detection of rare alleles with Paired-End Low Error Sequencing (PELE-Seq) Preston et al. BM enomics 2016, 16: MEHODOLOY RILE Open ccess High-specificity detection of rare alleles with Paired-End Low Error Sequencing (PELE-Seq) Jessica L. Preston 1*, riel E. Royall 1, Melissa.

More information

The hidden transcriptome: discovery of novel, stress-responsive transcription in Daphnia pulex

The hidden transcriptome: discovery of novel, stress-responsive transcription in Daphnia pulex University of Iowa Iowa Research Online Theses and Dissertations Spring 2011 The hidden transcriptome: discovery of novel, stress-responsive transcription in Daphnia pulex Stephen Butcher University of

More information

FGCZ NEWSLETTER FALL Next Generation Sequencing at the Functional Genomics Center Zurich

FGCZ NEWSLETTER FALL Next Generation Sequencing at the Functional Genomics Center Zurich FGCZ NEWSLETTER FALL 2011 newsletter Technologies, Applications, and Access to Support Next Generation Sequencing at the Functional Genomics Center Zurich OVERVIEW 1 NGS AT THE FGCZ Technologies and organization

More information

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory. Ensembl Tools EBI is an Outstation of the European Molecular Biology Laboratory. Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface I will check the Chat box periodically

More information

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing

More information

Measuring transcriptomes with RNA-Seq

Measuring transcriptomes with RNA-Seq Measuring transcriptomes with RNA-Seq BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2017 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC

More information

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Multiple choice questions (numbers in brackets indicate the number of correct answers) 1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer

More information

Top 5 Lessons Learned From MAQC III/SEQC

Top 5 Lessons Learned From MAQC III/SEQC Top 5 Lessons Learned From MAQC III/SEQC Weida Tong, Ph.D Division of Bioinformatics and Biostatistics, NCTR/FDA Weida.tong@fda.hhs.gov; 870 543 7142 1 MicroArray Quality Control (MAQC) An FDA led community

More information

SCALABLE, REPRODUCIBLE RNA-Seq

SCALABLE, REPRODUCIBLE RNA-Seq SCALABLE, REPRODUCIBLE RNA-Seq SCALABLE, REPRODUCIBLE RNA-Seq Advances in the RNA sequencing workflow, from sample preparation through data analysis, are enabling deeper and more accurate exploration

More information

Genome Sequence Assembly

Genome Sequence Assembly Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:

More information