Genomics AGRY Michael Gribskov Hock 331
|
|
- Nathaniel Young
- 6 years ago
- Views:
Transcription
1 Genomics AGRY Michael Gribskov Hock 331
2 Computing Essentials Resources In this course we will assemble and annotate both genomic and transcriptomic sequence assemblies We will use a wiki, to handle course logistics We will use computing facilities at the Rosen Center for Advance Computing (RCAC), in particular we will use the server scholar.rcac.purdue.edu You must have access to a computer, and preferably bring it to class on "computational days" You must understand how to use your computer to connect to RCAC
3
4 NGS Sequence Analysis General Process Simpler version of the first two bubbles of fig 1 in Ekblom Sample Preparation Draft Genome Annotation Sequencing Validation and QC Data Cleaning Scaffold Assembly Quality Control Contig Assembly
5 Genome Assembly Original plan for human genome Isolate chromosomes clone in Bacterial Artificial chromosome (BAC) vectors Find "golden path" to minimize sequencing Subclone BACs into plasmids and sequence using dideoxy chain terminating nucleotides (Sanger sequencing) Optimistically estimated to take $3 billion and take 15 years Initiated in 1990 claimed to be the largest collaborative project
6 Genome Assembly Whole Genome Shotgun (WGS) Assembly 1998 Crag Venter Why mess around with all the subcloning and tiling, why not just fragment the whole genome randomly and sequence all the pieces 1998 NIH It'll never work You have to sequence too many clones You won't be able to put it together 2000 Celera completes draft sequence using WGS approach Funding $300 Million
7 Genome Assembly How much sequence do you need (ca. 2004) Lander ES, Waterman MS, Genomic mapping by fingerprinting random clones: a mathematical analysis, Genomics 2: (1988) Depends on genome size (G) sequence length (L) number of sequences (N) coverage = L N / G In 1 st generation sequencing 16 X coverage was a good target For the human genome 15 X coverage 500 base reads 3.3 x 10 9 bp 99 million reads = $ 3 $30/base, = $9.9 $0.10/base
8 Genome Assembly
9 Genome Assembly Monascus Purpureus Used to make red yeast rice ( beni-koji and ang-kak) Also produces statins More on the wiki - genome size about ~50 Mb? has introns and other typical eukaryotic features Data 149,983,522 DNA reads (JGI) 150 base TruSeq paired-end 230 k RNA reads
10 Genome Assembly Illumina TruSeq System universal adapter Primer insert Primer index adapter bar code Paired end reads, each 150 bases Vocabulary paired-end mate-pair contig scaffold Ekblom, fig 2 (partial)
11 Quality and Cleaning TruSeq Adapters Universal adapter, 58 bases, same for all sequences, primer location > TruSeq_Universal_Adapter AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT > TruSeq_Universal_Adapter_Reversed AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT Index adapter, 63 bases, contains Barcode, primer location > TruSeq_Index_Adapter-GTAGAG TCGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTAGAGATCTCGTATGCCGTCTTGCTTG > TruSeq_Index_Adapter-GTAGAG_Reversed CAAGCAGAAGACGGCATACGAGATCTCTACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC
12 Quality and Cleaning What should we clean?
13 Quality and Cleaning Is it important? Depends on the assembler Depends on level of contamination Depends on depth Zhou & Rokas, Molec.Ecol. 23, ,2014.
14 Quality and Cleaning What is quality? Introduced in the Phred (Phil Green, UWa) program Quality Score, Q = -10 log 10 ε where ε is the expected error rate (probability of calling an incorrect base) 20 (P=0.01) is a commonly used cutoff Phred quality score Error Probability 10 1:10 90% 20 1:100 99% Base Call Accuracy 30 1:1, % 40 1:10, % 50 1:100, %
15 Genome Assembly Sequence files fastq Combines sequence an quality Beginning of sequence is marked rest of line has sequence ID and documentation Quality section begins with + Quality values are converted to letters in the ASCII alphabet by adding 33 to the Phred quality Typical ascii value = quality + 33 Some companies use 64 instead of 33
16 Quality and Cleaning Typical 1 st gen quality >jgi JGI_CAOP10014.rev JGI_CAOP10014.rev
17 Quality and Cleaning Fastq 1:N:0:GTAGAG GACCCATCCATTGTTGGACAGCTGAAGACGGGACGATCGTGCTCGTGTTTTGAATGCGAGAATCCCTGCAGAGGCTGCCTGCTTCGGNNNNNNNNNNTCCTCGACAGCC + CCCFFFFFHHHHHJIJJJJGIJJJJJJJJJJJIIJIJJJIIJIIHAFGIJJEHHHHFFFDCDDDDDDCDDDDDDBBDDDDDDCCDDB##########++28<<@BB>BD I = ascii 73 Quality = = 40 Quality = -10 log 10 ε ε = 10-4 # = ascii 35 Q = = 2 ε = = 0.63 = totally bogus
18 Quality and Cleaning FastQC Available on RCAC servers. You will use it. A good data set Zhou & Rokas, Molec.Ecol. 23, ,2014.
19 Quality and Cleaning FastQC a,b before and after quality trimming c sequence composition bias at 5' end d kmer enrichment. adapter dimer?? e non-random priming in RNAseq Zhou & Rokas, Molec.Ecol. 23, ,2014.
20 Quality and Cleaning FastQC - Monpu1.genome.rawReads.fastq
21 Quality and Cleaning Quick and Dirty check for primers Universal primer reverse, first 22 bases expected for read 2 (from index adapter) 99.99% are in read 2 (62228/62232) universal adapter Primer Forward Reverse Primer index adapter
22 Quality and Cleaning Quick and Dirty check for primers Sequence: Monpu1.genome.rawReads.fastq TruSeq index adapter (forward) first 22 bases TCGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTAGAGATCTCGTATGCCGTCTTGCTTG 92% are read 1 universal adapter Primer Forward Reverse Primer index adapter
23 Quality and Cleaning Quick and Dirty check for primers Sequence: Monpu1.genome.rawReads.fastq TruSeq index adapter (forward) first 22 bases TCGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTAGAGATCTCGTATGCCGTCTTGCTTG universal adapter Primer Forward Reverse Primer index adapter
24 Quality and Cleaning Quick and Dirty check for primers Sequence: Monpu1.genome.rawReads.fastq TruSeq index adapter (forward) first 22 bases TCGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTAGAGATCTCGTATGCCGTCTTGCTTG TruSeq Index Adapter (Reverse) CAAGCAGAAGACGGCATACGAGATCTCTACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC
25 Quality and Cleaning Quick and Dirty check for primers Sequence: Monpu1.genome.rawReads.fastq TruSeq universal primer: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT grep AATGATACGGCGACCACCGAGA Monpu1.genome.rawReads.fastq more AATGATACGGCGACCACCGAGATCTCGGATGCCGTCTTCTGCTTGAAAAATTAAGGATGATGAACTGCCGCGCAAGATCTTGTTAGAAATCTTGCTGCTGCGGGTACTTTCGGGGGAAATATTTCCTTGCAATCGGGGCCGAGCTTTGGG AATGATACGGCGACCACCGAGATCTACACTCTTTCTTCTTCTACTTCTCCTCCTTAACCACTCTCCTCTTTTCTCTTTCTACTTCTCCTTCTACCACTCTTCTACCACTTCTCCCTTTTCTCCCTCTCTGTCTTCCTCCACTTCTCCTTC AATGATACGGCGACCACCGAGATCTACACTCCTTCCCCAACCACTCCTCCACTCTTCCCCTTCTACTTCTCCTCCCCCACCACTTCTTTACTTCACCTCTCTTACACCTCCCATCTTTCTTCTTCTTCTCATCCTTCTCCTTCTACCACC AATGATACGGCGACCACCGAGATCTACACTCTTTCCCGTCCGTTCCCTACGCTCCATATTTCTCAACCCCCCGGCCTTGGACGGGGGGGGCGGACCGGCCCGGCGGAGCCCACGCGGCGCAGCTTGCTGCTCCTCGTGGTCGCGGCAACA AATGATACGGCGACCACCGAGATCTCGCATGCCGTCTTCTGCTTGAAAAATAAAGCCGTAGAGGGAGAGCGGATGGTCGACGTTGTGCAGCAACCGGCACGGCATGCTGGCGTTGGTGGTGGTCACGGAGTGGGAGACGGTTTAGGGAAG AATGATACGGCGACCACCGAGATCTACACTCTTCTCCTTCTCCTCTTTCTTCTTCTCCTTCTACCACTCTTCTCTTCTCCTTCTACTTCTTCTTCTTCTCCTTCTCCCTAACCACTCTTTCACTTCTCCTTCCTCCTCTTCTCCCTTACC AATGATACGGCGACCACCGAGATCTCTACGTGTCTTGACATCCCCCGCCTCCTCTTCCGACCCAATCCCTTTTTCAAAAACACCCCAGCGGGTGGGGAGGAACACCCTACACTTCCTTCCACCCCACCCTTTCCCAAACACAAAACCTCC AATGATACGGCGACCACCGAGATCTCGTATGCCGTCTTCTGCTTGAAAAAAATCAAACAGATGTCCGCGACGTCGCAACGCCCCGTTTCGCAGCCGTCGGCTCGGGAACCTGCCCAAGCACACCAACAGACGGCAAGCCACCATCACGAA AATGATACGGCGACCACCGAGATCTTCACCCTTTCCCTTCACATCTAAATCCACTCAGGTGGATACCAAATCGTTCTTTTTCAATTCTCCCCCCTCCCCCCGTACATCTTCGTACTTTTACTCACGTGCTACTGTCACGCACTCGTCCAC AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCTGCTTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTAGAGATCTCGTATGCCGTCTTCTGCTTGAAAAAAACAACACAACAGCAGCGTCTGCC AATGATACGGCGACCACCGAGATCTACACTCTATATGGTATCCCTGGAGGATCAATCCTCGCATACACGCAAGGATTGATCCTCCAGGGATACCATATAGAGTGTTCCGATCAATGTGTTACGGCATAGAGGGATGTAAGGAATGCAGCG AATGATACGGCGACCACCGAGATCTACACTATTTTCTCCGTTCTGAGCTCTTACTGCTCTTACTGGTTCACCAGGTGTCCATCTGGGTCGGCGTTTTTGGGGAATCAATGCCTACGTATTAATATCATGTGCCCTCTAAAGACTGTTTTT AATGATACGGCGACCACCGAGATCTACACTCCTTTCCTCCTTCCTTCCCTCTTATTTGATCTTTTTGTATTTAAACATGGAGTAGAGTGCAGTAAAATTTTAAGACCTTCCTTTATATTAGTAATAAAGATTATTAAATACGCTGGAAGC AATGATACGGCGACCACCGAGATCTACACTCTTTCTTCTCCTGACTACCTTTCCCCTCATGTTCGTGACCCTGCTCTTCCCTGGCTTCACTTTCTGGGTCGCCCAAAAAAGCGCCGCCCGCAAAGACTGCGGGCTACCACGCTACATCCT AATGATACGGCGACCACCGAGATCTACACTCTTTCCCATACCAGCCCAGACACCCCTCCACCTACTGTCAGCGAGAAAAGGTTAAAATGATGGAGCTTCTGAAAACACACCTGAATAACATCAAGATCCTCTGCACGCGCGCGGACAAAC AATGATACGGCGACCACCGAGATCTTCACTCTTTTCTTGCTCTTCTGATAATCTGGTGTTGGTTGGTGGTCGTTCTCATAGATTTGATCTTGCGCTTCAACCGGCGGTGGCGGTCCCGGCCTGCGGCTGGAGCGCTCGTGGCAACCGTCC AATGATACGGCGACCACCGAGATCTACACTCTGTCCCCACAGCTACGCTCTTACTCAAATCCTGATGTCTTCGGATGGATTTGAGTAAGAGCGGAGCTGTGGGGACCCGGAAGATGGTGGAGCATTGCTCAATATGGCGCAACAAGAGGA AATGATACGGCGACCACCGAGATCTACACTCTCCCCCTATGCATAGCTCCGACGTTGACGAGAAGGGTACTCTTCGCTCCCATCCCCTCGTCCTCCCCCCCAAAACCGTTCTGCGAGTGACCCCGCTGGCCCCTTTGTGCGCCGCACACC AATGATACGGCGACCACCGAGATCTACACTCTTTCTCTCTATTTATTCTTTATTTTTCTCTTTTTTCATTTCTCTTCCTCCCCAACCACTCTCTCTTTTTTCCACTTCTCTTTCCCCTTTAATATTCTATTTTTCTTCTCTTTATTCATT AATGATACGGCGACCACCGAGATCTACACTCTTTCCCCACCTCATCCTTCGCAAGCCACACCCCGTACAAACTACCCAGCTCTTATTTTCTCCTCGCGGTTGTTGGGGGCTGGTCGCCCCCGGGGCTCGGGCGCCGCTTTCGCCTCCCGA AATGATACGGCGACCACCGAGATCTACACTCTATCTCCTCAACCAGTCGAGGAGATAGAGGGTCAGACACTTCCGGTTCAGGGTCCTTGGGGCATGTTCCGGGCTAGGGCTGTGGGGGGTGGGGGTGATGCTGCTATTCTTCTTGGCACG AATGATACGGCGACCACCGAGATCTACACTCTTTTCTTAGCCCCTCTAAAGTCTTTTGATCTTGGGGTTGGGGCTGTCGTGTAAACCGAAACATCAAAGTGAGCCACTGGCAAAAAAACTTTTTACCAACCCTGCCTCCAGACCCACAAA AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCTGCTTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGTAGAGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAACAAACTAACATTCCTAGACCG AATGATACGGCGACCACCGAGATCTACACTCTTTCCCCCCACCCCCCCCCTATCTTTGTCACAACTGCACCTACAACCCCCGCCCCTCTCCTTTTCGGGGGCAGGATGCGCCCACACTTTCTCTTGTTTAATACAGTTCTTTTCCACCCC AATGATACGGCGACCACCGAGATCTACACTCTTTCCCCGCGACTGTAATTCGTCAAAGCCTCGACCTTTTCTCTTTGGAATATTAGCTTCCTGTCTTCTTTTTTCTTCTTCTCATTCTCCCTCACCTCATAATTCTGTCTTCAACATATA AATGATACGGCGACCACCGAGATCTACATTCCTTTTCTCCCTTCGTCTTAGTCCTACGTCGACTTGGTGAAGTCGACGTAGGACTAAGACGAAGGGAGAAAAGGAATGTTAGCAAGTTCCGCGCGTTTAACGCTAGGAAAGGAGAGGAAA CAAGCAGAAGACGGCATACGAGATCTCTACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTAATGATACGGCGACCACCGAGATCTCGTATGCCCGCTTCTGCTTGAAAAAAAAAAAAAGAGGCGACGAGACGTACGAAGACCGCAC AATGATACGGCGACCACCGAGATCTACACTCTTTCTTCTCATCATCGTCCTCCCTTTCGAAGTAGGGATGCATCTTTTTTGGCCCTTTTAGCTTTGTGCTGAAAAATACTATGTTTCTCATAATTCTTTTGTGAACACCATCCACTCCAC
26 Quality and Cleaning matches to reverse of index adapter universal adapter Primer Forward Reverse Primer index adapter
27 Quality and Cleaning reverse of index adapter adapter should end 41 bases to the right less than 1% error CAAGCAGAAGACGGCATACGAGATCTCTACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC
28 Quality and Cleaning How many have exact 22 base matches? universal adapter forward grep c AATGATACGGCGACCACCGAGA Monpu1.genome.rawReads.fastq 2694 universal adapter reverse grep c AGATCGGAAGAGCGTCGTGTAG Monpu1.genome.rawReads.fastq index adapter forward grep c GATCGGAAGAGCACACGTCTGA Monpu1.genome.rawReads.fastq index adapter, reverse grep c CAAGCAGAAGACGGCATACGAG Monpu1.genome.rawReads.fastq =166,922 / 149,983,353 total reads = 0.11 % most are what is expected for small or no insert in expected orientation What about mismatches?
29 5' adapter matches vs length Universal forward Universal reverse Index forward Index reverse Random Expected
30 RCAC Modules On RCAC servers many bioinformatics tools have been installed. These are referred to as modules. This is a system specific to RCAC and Purdue Module commands module avail show available modules. To see available modules you must first run module use /apps/group/bioinformatics/modules. Put this in your.bash_profile module load load a module module list show currently loaded modules module show show details about the installation of a module The module system makes it unnecessary to load specific paths, environment symbols an program names on your own Because bioinformatics modules change rapidly, multiple versions are often available
31 RCAC Modules Module avail When there are different versions, one is the default Default is sometimes but not always shown Different versions
32 RCAC Other programs You are not limited to modules, you can download an run programs on your own. This is the main use for your home directory For instance sickle, which is available on github
33 RCAC Other Programs I want to download onto the scholar server not my PC Option one: download on PC/Mac and transfer Option two: right click on the download Zip button to copy the URL in my home directory type wget unzip the resulting file result
34 RCAC Other programs Read the file README.md This will tell you, amongst a lot of other stuff To build Sickle, enter: make After running make I have a new file called sickle Try sickle h or sickle help this will usually give you some brief directions Also look for a doc directory Also look for files named README or MANUAL
35 RCAC Batch jobs not using a module The RCAC server are not designed to run jobs interactively. Instead, jobs are submitted to a queuing system called PBS (or Torque) Since you cannot run jobs on the frontend systems you will need to make job files to submit your jobs
36 RCAC Batch job using a module
37 Assignment Adapter Cleaning See the wiki page: For 5 pts, choose one of the installed module programs and run it on the monascus sequences. Be prepared to explain why you chose the settings you chose. For 10 pts, install and run one of the non-module adapter cleaners. You may use one not on the wiki page. Explain why you chose settings. You may do both of the above Check how well it worked Using grep as shown in class Using FastQC (installed module, run as batch job) Upload relevant information onto your group page Add comments about download sites and papers to the wiki page
Sequence Assembly and Alignment. Jim Noonan Department of Genetics
Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome
More informationGenome Projects. Part III. Assembly and sequencing of human genomes
Genome Projects Part III Assembly and sequencing of human genomes All current genome sequencing strategies are clone-based. 1. ordered clone sequencing e.g., C. elegans well suited for repetitive sequences
More informationBioinformatics for Genomics
Bioinformatics for Genomics It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material. When I was young my Father
More informationDATA FORMATS AND QUALITY CONTROL
HTS Summer School 12-16th September 2016 DATA FORMATS AND QUALITY CONTROL Romina Petersen, University of Cambridge (rp520@medschl.cam.ac.uk) Luigi Grassi, University of Cambridge (lg490@medschl.cam.ac.uk)
More informationNEXT GENERATION SEQUENCING. Farhat Habib
NEXT GENERATION SEQUENCING HISTORY HISTORY Sanger Dominant for last ~30 years 1000bp longest read Based on primers so not good for repetitive or SNPs sites HISTORY Sanger Dominant for last ~30 years 1000bp
More informationParts of a standard FastQC report
FastQC FastQC, written by Simon Andrews of Babraham Bioinformatics, is a very popular tool used to provide an overview of basic quality control metrics for raw next generation sequencing data. There are
More informationDe Novo Assembly of High-throughput Short Read Sequences
De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,
More informationWe begin with a high-level overview of sequencing. There are three stages in this process.
Lecture 11 Sequence Assembly February 10, 1998 Lecturer: Phil Green Notes: Kavita Garg 11.1. Introduction This is the first of two lectures by Phil Green on Sequence Assembly. Yeast and some of the bacterial
More informationRead Quality Assessment & Improvement. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016
Read Quality Assessment & Improvement UCD Genome Center Bioinformatics Core Tuesday 14 June 2016 QA&I should be interactive Error modes Each technology has unique error modes, depending on the physico-chemical
More informationDNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.
DNA Preparation and QC Extraction DNA was extracted from whole blood or flash frozen post-mortem tissue using a DNA mini kit (QIAmp #51104 and QIAmp#51404, respectively) following the manufacturer s recommendations.
More informationData Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis
Data Basics Josef K Vogt Slides by: Simon Rasmussen 2017 Generalized NGS analysis Sample prep & Sequencing Data size Main data reductive steps SNPs, genes, regions Application Assembly: Compare Raw Pre-
More informationGenome Sequencing-- Strategies
Genome Sequencing-- Strategies Bio 4342 Spring 04 What is a genome? A genome can be defined as the entire DNA content of each nucleated cell in an organism Each organism has one or more chromosomes that
More informationCourse summary. Today. PCR Polymerase chain reaction. Obtaining molecular data. Sequencing. DNA sequencing. Genome Projects.
Goals Organization Labs Project Reading Course summary DNA sequencing. Genome Projects. Today New DNA sequencing technologies. Obtaining molecular data PCR Typically used in empirical molecular evolution
More informationLecture 7. Next-generation sequencing technologies
Lecture 7 Next-generation sequencing technologies Next-generation sequencing technologies General principles of short-read NGS Construct a library of fragments Generate clonal template populations Massively
More informationNext-generation sequencing and quality control: An introduction 2016
Next-generation sequencing and quality control: An introduction 2016 s.schmeier@massey.ac.nz http://sschmeier.com/bioinf-workshop/ Overview Typical workflow of a genomics experiment Genome versus transcriptome
More informationBENG 183 Trey Ideker. Genome Assembly and Physical Mapping
BENG 183 Trey Ideker Genome Assembly and Physical Mapping Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms
More informationDNA Sequencing and Assembly
DNA Sequencing and Assembly CS 262 Lecture Notes, Winter 2016 February 2nd, 2016 Scribe: Mark Berger Abstract In this lecture, we survey a variety of different sequencing technologies, including their
More informationCSE182-L16. LW statistics/assembly
CSE182-L16 LW statistics/assembly Silly Quiz Who are these people, and what is the occasion? Genome Sequencing and Assembly Sequencing A break at T is shown here. Measuring the lengths using electrophoresis
More informationDe Novo Assembly (Pseudomonas aeruginosa MAPO1 ) Sample to Insight
De Novo Assembly (Pseudomonas aeruginosa MAPO1 ) Sample to Insight 1 Workflow Import NGS raw data QC on reads De novo assembly Trim reads Finding Genes BLAST Sample to Insight Case Study Pseudomonas aeruginosa
More informationFrancisco García Quality Control for NGS Raw Data
Contents Data formats Sequence capture Fasta and fastq formats Sequence quality encoding Quality Control Evaluation of sequence quality Quality control tools Identification of artifacts & filtering Practical
More informationNext Gen Sequencing. Expansion of sequencing technology. Contents
Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND
More informationSequencing techniques
Sequencing techniques Workshop on Whole Genome Sequencing and Analysis, 2-4 Oct. 2017 Learning objective: After this lecture, you should be able to account for different techniques for whole genome sequencing
More informationSanger vs Next-Gen Sequencing
Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics
More informationA Guide to Consed Michelle Itano, Carolyn Cain, Tien Chusak, Justin Richner, and SCR Elgin.
1 A Guide to Consed Michelle Itano, Carolyn Cain, Tien Chusak, Justin Richner, and SCR Elgin. Main Window Figure 1. The Main Window is the starting point when Consed is opened. From here, you can access
More informationContact us for more information and a quotation
GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA
More informationAlignment and Assembly
Alignment and Assembly Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which
More informationLander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not in book
Lander-Waterman Statistics for Shotgun Sequencing Math 283: Ewens & Grant 5.1 Math 186: Not in book Prof. Tesler Math 186 & 283 Winter 2019 Prof. Tesler 5.1 Shotgun Sequencing Math 186 & 283 / Winter 2019
More information10/20/2009 Comp 590/Comp Fall
Lecture 14: DNA Sequencing Study Chapter 8.9 10/20/2009 Comp 590/Comp 790-90 Fall 2009 1 DNA Sequencing Shear DNA into millions of small fragments Read 500 700 nucleotides at a time from the small fragments
More informationIllumina Read QC. UCD Genome Center Bioinformatics Core Monday 29 August 2016
Illumina Read QC UCD Genome Center Bioinformatics Core Monday 29 August 2016 QC should be interactive Error modes Each technology has unique error modes, depending on the physico-chemical processes involved
More informationIntroduction to Next Generation Sequencing
The Sequencing Revolution Introduction to Next Generation Sequencing Dena Leshkowitz,WIS 1 st BIOmics Workshop High throughput Short Read Sequencing Technologies Highly parallel reactions (millions to
More informationGenome Sequence Assembly
Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:
More informationDNA sequencing. Course Info
DNA sequencing EECS 458 CWRU Fall 2004 Readings: Pevzner Ch1-4 Adams, Fields & Venter (ISBN:0127170103) Serafim Batzoglou s slides Course Info Instructor: Jing Li 509 Olin Bldg Phone: X0356 Email: jingli@eecs.cwru.edu
More informationIllumina Sequencing Error Profiles and Quality Control
Illumina Sequencing Error Profiles and Quality Control RNA-seq Workflow Biological samples/library preparation Sequence reads FASTQC Adapter Trimming (Optional) Splice-aware mapping to genome Counting
More informationMolecular Biology: DNA sequencing
Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides
More informationDifferential gene expression analysis using RNA-seq
https://abc.med.cornell.edu/ Differential gene expression analysis using RNA-seq Applied Bioinformatics Core, March 2018 Friederike Dündar with Luce Skrabanek & Paul Zumbo Day 1: Introduction into high-throughput
More informationLawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory
Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title Glomus intraradices: Initial Whole-Genome Shotgun Sequencing and Assembly Results Permalink https://escholarship.org/uc/item/4c13k1dh
More informationBrowser Exercises - I. Alignments and Comparative genomics
Browser Exercises - I Alignments and Comparative genomics 1. Navigating to the Genome Browser (GBrowse) Note: For this exercise use http://www.tritrypdb.org a. Navigate to the Genome Browser (GBrowse)
More informationExperimental Design Microbial Sequencing
Experimental Design Microbial Sequencing Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu General rules for preparing
More informationLecture 14: DNA Sequencing
Lecture 14: DNA Sequencing Study Chapter 8.9 10/17/2013 COMP 465 Fall 2013 1 Shear DNA into millions of small fragments Read 500 700 nucleotides at a time from the small fragments (Sanger method) DNA Sequencing
More informationExperimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis
-Seq Analysis Quality Control checks Reproducibility Reliability -seq vs Microarray Higher sensitivity and dynamic range Lower technical variation Available for all species Novel transcript identification
More informationISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June Introduction
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC 1/SC 29/WG 11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June
More informationMate-pair library data improves genome assembly
De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate
More informationWorkflow of de novo assembly
Workflow of de novo assembly Experimental Design Clean sequencing data (trim adapter and low quality sequences) Run assembly software for contiging and scaffolding Evaluation of assembly Several iterations:
More informationDe novo whole genome assembly
De novo whole genome assembly Qi Sun Bioinformatics Facility Cornell University Sequencing platforms Short reads: o Illumina (150 bp, up to 300 bp) Long reads (>10kb): o PacBio SMRT; o Oxford Nanopore
More informationQIAseq Targeted Panel Analysis Plugin USER MANUAL
QIAseq Targeted Panel Analysis Plugin USER MANUAL User manual for QIAseq Targeted Panel Analysis 1.1 Windows, macos and Linux June 18, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej
More informationSequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio
More information3) This diagram represents: (Indicate all correct answers)
Functional Genomics Midterm II (self-questions) 2/4/05 1) One of the obstacles in whole genome assembly is dealing with the repeated portions of DNA within the genome. How do repeats cause complications
More informationNGS developments in tomato genome sequencing
NGS developments in tomato genome sequencing 16-02-2012, Sandra Smit TATGTTTTGGAAAACATTGCATGCGGAATTGGGTACTAGGTTGGACCTTAGTACC GCGTTCCATCCTCAGACCGATGGTCAGTCTGAGAGAACGATTCAAGTGTTGGAAG ATATGCTTCGTGCATGTGTGATAGAGTTTGGTGGCCATTGGGATAGCTTCTTACC
More informationGenome Assembly Software for Different Technology Platforms. PacBio Canu Falcon. Illumina Soap Denovo Discovar Platinus MaSuRCA.
Genome Assembly Software for Different Technology Platforms PacBio Canu Falcon 10x SuperNova Illumina Soap Denovo Discovar Platinus MaSuRCA Experimental design using Illumina Platform Estimate genome size:
More informationZika infected human samples
Lecture 16 RNA-seq Zika infected human samples Experimental design ZIKV-infected hnpcs 56 hours after ZIKA and mock infection in parallel cultures were used for global transcriptome analysis. RNA-seq libraries
More informationRADseq Data Analysis Workshop 3 February 2017
RADseq Data Analysis Workshop 3 February 2017 Introduction to Galaxy (thanks to Simon Gladman for slides) What is Galaxy? A web-based scalable workflow platform for genomic analysis Designed for biologists
More informationIntroduction of RNA-Seq Analysis
Introduction of RNA-Seq Analysis Jiang Li, MS Bioinformatics System Engineer I Center for Quantitative Sciences(CQS) Vanderbilt University September 21, 2012 Goal of this talk 1. Act as a practical resource
More informationNext Generation Sequencing. Tobias Österlund
Next Generation Sequencing Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45
More information1. A brief overview of sequencing biochemistry
Supplementary reading materials on Genome sequencing (optional) The materials are from Mark Blaxter s lecture notes on Sequencing strategies and Primary Analysis 1. A brief overview of sequencing biochemistry
More informationWhy QC? Next-Generation Sequencing: Quality Control. Illumina data format. Fastq format:
Why QC? Next-Generation Sequencing: Quality Control BaRC Hot Topics January 2017 Bioinformatics and Research Computing Whitehead Institute Do you want to include the reads with low quality base calls?
More informationMapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010
Mapping Next Generation Sequence Reads Bingbing Yuan Dec. 2, 2010 1 What happen if reads are not mapped properly? Some data won t be used, thus fewer reads would be aligned. Reads are mapped to the wrong
More informationIntroduction to 'Omics and Bioinformatics
Introduction to 'Omics and Bioinformatics Chris Overall Department of Bioinformatics and Genomics University of North Carolina Charlotte Acquire Store Analyze Visualize Bioinformatics makes many current
More informationDNA vs. RNA DNA: deoxyribonucleic acid (double stranded) RNA: ribonucleic acid (single stranded) Both found in most bacterial and eukaryotic cells RNA
DNA Replication DNA vs. RNA DNA: deoxyribonucleic acid (double stranded) RNA: ribonucleic acid (single stranded) Both found in most bacterial and eukaryotic cells RNA molecule can assume different structures
More informationThe Diploid Genome Sequence of an Individual Human
The Diploid Genome Sequence of an Individual Human Maido Remm Journal Club 12.02.2008 Outline Background (history, assembling strategies) Who was sequenced in previous projects Genome variations in J.
More informationNext-Generation Sequencing: Quality Control
Next-Generation Sequencing: Quality Control Bingbing Yuan BaRC Hot Topics January 2017 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/ Why QC? Do you want to
More informationBiol 478/595 Intro to Bioinformatics
Biol 478/595 Intro to Bioinformatics September M 1 Labor Day 4 W 3 MG Database Searching Ch. 6 5 F 5 MG Database Searching Hw1 6 M 8 MG Scoring Matrices Ch 3 and Ch 4 7 W 10 MG Pairwise Alignment 8 F 12
More informationQuality control for Sequencing Experiments
Quality control for Sequencing Experiments v2018-04 Simon Andrews simon.andrews@babraham.ac.uk Support service for bioinformatics Academic Babraham Institute Commercial Consultancy Support BI Sequencing
More informationSequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements
More informationIntroduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph
Introduction to Plant Genomics and Online Resources Manish Raizada University of Guelph Genomics Glossary http://www.genomenewsnetwork.org/articles/06_00/sequence_primer.shtml Annotation Adding pertinent
More informationGENOME ASSEMBLY FINAL PIPELINE AND RESULTS
GENOME ASSEMBLY FINAL PIPELINE AND RESULTS Faction 1 Yanxi Chen Carl Dyson Sean Lucking Chris Monaco Shashwat Deepali Nagar Jessica Rowell Ankit Srivastava Camila Medrano Trochez Venna Wang Seyed Alireza
More informationIntroduction to CGE tools
Introduction to CGE tools Pimlapas Leekitcharoenphon (Shinny) Research Group of Genomic Epidemiology, DTU-Food. WHO Collaborating Centre for Antimicrobial Resistance in Foodborne Pathogens and Genomics.
More informationBiochemistry. Dr. Shariq Syed. Shariq AIKC/FinalYB/2014
Biochemistry Dr. Shariq Syed Shariq AIKC/FinalYB/2014 What is DNA Sequence?? Our Genome is made up of DNA Biological instructions are written in our DNA in chemical form The order (sequence) in which nucleotides
More informationSequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio
More informationGENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,
More informationIncorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits
Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing
More informationChapter 20 DNA Technology & Genomics. If we can, should we?
Chapter 20 DNA Technology & Genomics If we can, should we? Biotechnology Genetic manipulation of organisms or their components to make useful products Humans have been doing this for 1,000s of years plant
More informationIntroduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014
Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 21 2011, pages 2957 2963 doi:10.1093/bioinformatics/btr507 Genome analysis Advance Access publication September 7, 2011 : fast length adjustment of short reads
More informationA shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter
A shotgun introduction to sequence assembly (with Velvet) MCB 247 - Brem, Eisen and Pachter Hot off the press January 27, 2009 06:00 AM Eastern Time llumina Launches Suite of Next-Generation Sequencing
More informationConditional Random Fields, DNA Sequencing. Armin Pourshafeie. February 10, 2015
Conditional Random Fields, DNA Sequencing Armin Pourshafeie February 10, 2015 CRF Continued HMMs represent a distribution for an observed sequence x and a parse P(x, ). However, usually we are interested
More informationDNA and genome sequencing. Matthew Hudson Dept of Crop Sciences University of Illinois
DNA and genome sequencing Matthew Hudson Dept of Crop Sciences University of Illinois Genome projects 2,424 ongoing genome projects 696 for eukaryotes 520 completed genomes 47 from eukaryotes Almost every
More informationPLNT2530 (2018) Unit 6b Sequence Libraries
PLNT2530 (2018) Unit 6b Sequence Libraries Molecular Biotechnology (Ch 4) Analysis of Genes and Genomes (Ch 5) Unless otherwise cited or referenced, all content of this presenataion is licensed under the
More informationEcole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech
GALAXY INITIATION A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech How does Next- Gen sequencing work? DNA fragmentation Size selection and clonal amplification Massive parallel sequencing ACCGTTTGCCG
More informationCSCI2950-C DNA Sequencing and Fragment Assembly
CSCI2950-C DNA Sequencing and Fragment Assembly Lecture 2: Sept. 7, 2010 http://cs.brown.edu/courses/csci2950-c/ DNA sequencing How we obtain the sequence of nucleotides of a species 5 3 ACGTGACTGAGGACCGTG
More informationChapter 8: Recombinant DNA. Ways this technology touches us. Overview. Genetic Engineering
Chapter 8 Recombinant DNA and Genetic Engineering Genetic manipulation Ways this technology touches us Criminal justice The Justice Project, started by law students to advocate for DNA testing of Death
More information2nd (Next) Generation Sequencing 2/2/2018
2nd (Next) Generation Sequencing 2/2/2018 Why do we want to sequence a genome? - To see the sequence (assembly) To validate an experiment (insert or knockout) To compare to another genome and find variations
More informationQuality Control of Sequencing Data
Quality Control of Sequencing Data Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, Ithaca, NY ss2489@cornell.edu // Twitter:@SahaSurya BTI Plant Bioinformatics Course 2015 Slides: Aureliano
More informationC3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère
C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/
More informationDeep Sequencing technologies
Deep Sequencing technologies Gabriela Salinas 30 October 2017 Transcriptome and Genome Analysis Laboratory http://www.uni-bc.gwdg.de/index.php?id=709 Microarray and Deep-Sequencing Core Facility University
More informationTranscriptome Assembly, Functional Annotation (and a few other related thoughts)
Transcriptome Assembly, Functional Annotation (and a few other related thoughts) Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 23, 2017 Differential Gene Expression Generalized Workflow File Types
More informationTranscriptome analysis
Statistical Bioinformatics: Transcriptome analysis Stefan Seemann seemann@rth.dk University of Copenhagen April 11th 2018 Outline: a) How to assess the quality of sequencing reads? b) How to normalize
More informationDE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN
DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN ... 2014 2015 2016 2017 ... 2014 2015 2016 2017 Synthetic
More informationMatthew Tinning Australian Genome Research Facility. July 2012
Next-Generation Sequencing: an overview of technologies and applications Matthew Tinning Australian Genome Research Facility July 2012 History of Sequencing Where have we been? 1869 Discovery of DNA 1909
More informationGenome Assembly. J Fass UCD Genome Center Bioinformatics Core Friday September, 2015
Genome Assembly J Fass UCD Genome Center Bioinformatics Core Friday September, 2015 From reads to molecules What s the Problem? How to get the best assemblies for the smallest expense (sequencing) and
More informationReading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction
Lecture 8 Reading Lecture 8: 96-110 Lecture 9: 111-120 DNA Libraries Definition Types Construction 142 DNA Libraries A DNA library is a collection of clones of genomic fragments or cdnas from a certain
More informationAssemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz
Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Table of Contents Supplementary Note 1: Unique Anchor Filtering Supplementary Figure
More informationGENETICS EXAM 3 FALL a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size.
Student Name: All questions are worth 5 pts. each. GENETICS EXAM 3 FALL 2004 1. a) is a technique that allows you to separate nucleic acids (DNA or RNA) by size. b) Name one of the materials (of the two
More informationCISC 889 Bioinformatics (Spring 2004) Lecture 3
CISC 889 Bioinformatics (Spring 004) Lecture Genome Sequencing Li Liao Computer and Information Sciences University of Delaware Administrative Have you visited The NCBI website? Have you read Hunter s
More information1. Introduction Gene regulation Genomics and genome analyses
1. Introduction Gene regulation Genomics and genome analyses 2. Gene regulation tools and methods Regulatory sequences and motif discovery TF binding sites Databases 3. Technologies Microarrays Deep sequencing
More informationUsing the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010!
Using the Potato Genome Sequence! Robin Buell! Michigan State University! Department of Plant Biology! August 15, 2010! buell@msu.edu! 1 Whole Genome Shotgun Sequencing 2 New Technologies Revolutionize
More informationMultiple choice questions (numbers in brackets indicate the number of correct answers)
1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer
More informationIntroduction to the MiSeq
Introduction to the MiSeq 2011 Illumina, Inc. All rights reserved. Illumina, illuminadx, BeadArray, BeadXpress, cbot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate,
More informationGenome 373: Mapping Short Sequence Reads II. Doug Fowler
Genome 373: Mapping Short Sequence Reads II Doug Fowler The final Will be in this room on June 6 th at 8:30a Will be focused on the second half of the course, but will include material from the first half
More informationIntroduction to bioinformatics (NGS data analysis)
Introduction to bioinformatics (NGS data analysis) Alexander Jueterbock 2015-06-02 1 / 45 Got your sequencing data - now, what to do with it? File size: several Gb Number of lines: >1,000,000 @M02443:17:000000000-ABPBW:1:1101:12675:1533
More information