Alignment methods. Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics

Similar documents
Sequence Assembly and Alignment. Jim Noonan Department of Genetics

NEXT GENERATION SEQUENCING. Farhat Habib

Illumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI

Genome 373: Mapping Short Sequence Reads II. Doug Fowler

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

GNUMap: Unbiased Probabilistic Mapping of Next- Generation Sequencing Reads

Analysis of structural variation. Alistair Ward USTAR Center for Genetic Discovery University of Utah

Alignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014

Introduction to Short Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016

Analysis of structural variation. Alistair Ward - Boston College

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Read Mapping and Variant Calling. Johannes Starlinger

NGS in Pathology Webinar

Challenging algorithms in bioinformatics

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010

Next Generation Sequencing. Tobias Österlund

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Introduction to Next Generation Sequencing

Short Read Alignment to a Reference Genome

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

High-Throughput Bioinformatics: Re-sequencing and de novo assembly. Elena Czeizler

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012

Genomic DNA ASSEMBLY BY REMAPPING. Course overview

Data Mining for Biological Data Analysis

SNP calling and VCF format

NGS part 2: applications. Tobias Österlund

SEQUENCING. M Ataei, PhD. Feb 2016

Next Generation Sequencing: An Overview

Analysis of RNA-seq Data

TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR)

Disclosing the nature of computational tools for the analysis of Next Generation Sequencing data.

L3: Short Read Alignment to a Reference Genome

Functional annotation of metagenomes

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Next-Generation Sequencing. Technologies

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Read Complexity

Data Analysis with CASAVA v1.8 and the MiSeq Reporter

BST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data

Mapping of Next Generation Sequencing Data

Single alignment: FASTA. 17 march 2017


Comparing a few SNP calling algorithms using low-coverage sequencing data

Next Genera*on Sequencing II: Personal Genomics. Jim Noonan Department of Gene*cs

Introduction to 'Omics and Bioinformatics

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science

Theory and Application of Multiple Sequence Alignments

Bioinformatics in next generation sequencing projects

ChIP-seq and RNA-seq

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)

About Strand NGS. Strand Genomics, Inc All rights reserved.

BIOINFORMATICS ORIGINAL PAPER

NGS Data Analysis and Galaxy

Course Presentation. Ignacio Medina Presentation

Eucalyptus gene assembly

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Frumkin, 2e Part 1: Methods and Paradigms. Chapter 6: Genetics and Environmental Health

IDENTIFYING A DISEASE CAUSING MUTATION

Optimization of Process Parameters of Global Sequence Alignment Based Dynamic Program - an Approach to Enhance the Sensitivity.

Next-generation sequencing technologies

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

Genome Sequence Assembly

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

ChIP-seq and RNA-seq. Farhat Habib

Next Generation Sequencing. Dylan Young Biomedical Engineering

NOW GENERATION SEQUENCING. Monday, December 5, 11

Variant Discovery. Jie (Jessie) Li PhD Bioinformatics Analyst Bioinformatics Core, UCD

Lecture 7. Next-generation sequencing technologies

Bioinformatics Course AA 2017/2018 Tutorial 2

Introduction to NGS analyses

STAT 536: Genetic Statistics

The Basics of Understanding Whole Genome Next Generation Sequence Data

Bioinformatics for High Throughput Sequencing

HHS Public Access Author manuscript Nat Biotechnol. Author manuscript; available in PMC 2012 May 07.

Galaxy Platform For NGS Data Analyses

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

DNA REPLICATION & BIOTECHNOLOGY Biology Study Review

Applications of short-read

GREG GIBSON SPENCER V. MUSE

CHAPTER 4 PATTERN CLASSIFICATION, SEARCHING AND SEQUENCE ALIGNMENT

A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Contact us for more information and a quotation

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

Statistical method for Next Generation Sequencing pipeline comparison

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

The Malta Human Genome Project Progress Report

Genomics. Data Analysis & Visualization. Camilo Valdes

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Compute- and Data-Intensive Analyses in Bioinformatics"

Structural variation analysis using NGS sequencing

Chapter 7. DNA Microarrays

Biology Evolution: Mutation I Science and Mathematics Education Research Group

Performance comparison of five RNA-seq alignment tools

Transcription:

Alignment methods Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics

Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software Metagenomics course 1/28 Thursday, 7 February 2013

Sequence alignment Identifying regions of similarity in sequences Metagenomics course 2/28 Thursday, 7 February 2013

Sequence alignment Identifying regions of similarity in sequences In NGS Recovering original nucleotide sequence... from many short fragments... using a known reference Metagenomics course 2/28 Thursday, 7 February 2013

Sequence alignment Pairwise alignment Metagenomics course 3/28 Thursday, 7 February 2013

Sequence alignment Multiple sequence alignment Metagenomics course 4/28 Thursday, 7 February 2013

Sequence alignment Global vs local alignment Metagenomics course 5/28 Thursday, 7 February 2013

Sequence alignment Structural alignment Metagenomics course 6/28 Thursday, 7 February 2013

Assembly vs alignment Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software Metagenomics course 7/28 Thursday, 7 February 2013

Assembly vs alignment Assembly Metagenomics course 8/28 Thursday, 7 February 2013

Assembly vs alignment Assembly Alignment Metagenomics course 8/28 Thursday, 7 February 2013

Assembly vs alignment Assembly Memory hungry Needs high coverage Metagenomics course 9/28 Thursday, 7 February 2013

Assembly vs alignment Assembly Memory hungry Needs high coverage Alignment Easy to do in parallel Restricted by reference sequence highly polymorphic regions large insertions Metagenomics course 9/28 Thursday, 7 February 2013

Alignment methods Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software Metagenomics course 10/28 Thursday, 7 February 2013

Alignment methods Smith-Waterman Generalization of Needleman-Wunsch Guaranteed optimal alignment A C A C A C T A 0 0 0 0 0 0 0 0 0 A 0 2 1 2 1 2 1 0 2 G 0 1 1 1 1 1 1 0 1 C 0 0 3 2 3 2 3 2 1 A 0 2 2 5 4 5 4 3 4 C 0 1 4 4 7 6 7 6 5 A 0 2 3 6 6 9 8 7 8 C 0 1 4 5 8 8 11 10 9 A 0 2 3 6 7 10 10 10 12 gap penalty = 1 match=+2 mismatch= 1 Metagenomics course 11/28 Thursday, 7 February 2013

Alignment methods 2-step alignment Metagenomics course 12/28 Thursday, 7 February 2013

Alignment methods 2-step alignment Step 1: Find candidate positions Use read seeds Hash table-based or Burrows-Wheeler transform-based heuristic Balance between speed and accuracy Metagenomics course 12/28 Thursday, 7 February 2013

Alignment methods 2-step alignment Step 2: Align and report Complete alignment with Smith-Waterman Evaluate alignment(s) Metagenomics course 12/28 Thursday, 7 February 2013

Common issues Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software Metagenomics course 13/28 Thursday, 7 February 2013

Common issues Insertions and deletions (indels) Metagenomics course 14/28 Thursday, 7 February 2013

Common issues Insertions and deletions (indels) Local realignment around indels Per-Base Alignment Qualities (BAQ) Metagenomics course 14/28 Thursday, 7 February 2013

Common issues Non-unique alignment How to report non-unique alignments? Metagenomics course 15/28 Thursday, 7 February 2013

Common issues Non-unique alignment How to report non-unique alignments? Discard entirely Choose one randomly Report all with best quality above some quality Depends on the tool Metagenomics course 15/28 Thursday, 7 February 2013

Common issues Structural variation Chromosomal relocation Inversion Large indels Copy-number variation Use specialized tools Metagenomics course 16/28 Thursday, 7 February 2013

Common issues Split-read mapping Allow aligned read to be split For example RNA reads on DNA reference Metagenomics course 17/28 Thursday, 7 February 2013

Common issues Split-read mapping Allow aligned read to be split For example RNA reads on DNA reference Metagenomics course 17/28 Thursday, 7 February 2013

Common issues Circular alignment Circular genome (e.g. bacteria, mitochondria) Metagenomics course 18/28 Thursday, 7 February 2013

Common issues Circular alignment Circular genome (e.g. bacteria, mitochondria) Most aligners assume linear reference Metagenomics course 18/28 Thursday, 7 February 2013

Common issues Circular alignment Circular genome (e.g. bacteria, mitochondria) Most aligners assume linear reference Trick: extend reference Metagenomics course 18/28 Thursday, 7 February 2013

Common issues Circular alignment Circular genome (e.g. bacteria, mitochondria) Most aligners assume linear reference Trick: extend reference copy first N bases to the end Metagenomics course 18/28 Thursday, 7 February 2013

Common issues Circular alignment Circular genome (e.g. bacteria, mitochondria) Most aligners assume linear reference Trick: extend reference copy first N bases to the end restore alignment to original reference Metagenomics course 18/28 Thursday, 7 February 2013

Platform specifics Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software Metagenomics course 19/28 Thursday, 7 February 2013

Platform specifics Paired-end sequencing Metagenomics course 20/28 Thursday, 7 February 2013

Platform specifics Paired-end sequencing Align reads separately Choose from non-unique alignments based on pairing Metagenomics course 20/28 Thursday, 7 February 2013

Platform specifics Color-space (or SOLiD) reads Used by 454, Solexa, SOLiD systems Di-nucleotide encoding Needs support from alignment software Metagenomics course 21/28 Thursday, 7 February 2013

Platform specifics Color-space (or SOLiD) reads Used by 454, Solexa, SOLiD systems Di-nucleotide encoding Needs support from alignment software Metagenomics course 21/28 Thursday, 7 February 2013

Platform specifics Color-space (or SOLiD) reads Decoding Metagenomics course 22/28 Thursday, 7 February 2013

Error profile Platform specifics Homopolymers CG-content Positional (example shown) Metagenomics course 23/28 Thursday, 7 February 2013

Software Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software Metagenomics course 24/28 Thursday, 7 February 2013

Software Some popular aligners for NGS Hash table-based Eland MAQ Metagenomics course 25/28 Thursday, 7 February 2013

Software Some popular aligners for NGS Hash table-based Eland MAQ Burrows-Wheeler Transform-based Bowtie BWA Metagenomics course 25/28 Thursday, 7 February 2013

Software Some popular aligners for NGS Hash table-based Eland MAQ Burrows-Wheeler Transform-based Bowtie BWA Split-read alignment Tophat GSNAP Mosaik Metagenomics course 25/28 Thursday, 7 February 2013

Viewers Software IGV, Savant, Geneyous, Tablet Metagenomics course 26/28 Thursday, 7 February 2013

Viewers Software IGV, Savant, Geneyous, Tablet tview (console-based) Metagenomics course 26/28 Thursday, 7 February 2013

Viewers Software IGV, Savant, Geneyous, Tablet tview (console-based) UCSC Genome Browser, GBrowse (web-based) Metagenomics course 26/28 Thursday, 7 February 2013

Questions? Acknowledgements: Jeroen Laros Bas E. Dutilh Metagenomics course 27/28 Thursday, 7 February 2013

Questions? Image sources cbsu.tc.cornell.edu/ngw2010/day2 lecture1.pdf en.wikipedia.org/wiki/sequence alignment en.wikipedia.org/wiki/multiple sequence alignment www.pitt.edu/ mcs2/teaching/biocomp/tutorials/global.html www.biology-direct.com/content/4/1/30/figure/f3?highres=y www.genomesunzipped.org/2012/04/guest-post-accurate-identification-of-rna-editing-sites-from-high -throughput-sequencing-data.php www.eplantscience.com/botanical biotechnology biology chemistry/biotechnology/genes genetic engineering/genes nature concept and synthesis/biotech physical nature dna.php www.pnas.org/content/109/4/1347/f1.expansion.html omega.rc.unesp.br/mauricio/curso/bibliografia/22/362/dibase%20sequencing%20and%20color%20space %20Analysis.pdf cgrlucb.wikispaces.com/samtoolsspring2012 and some of my own Metagenomics course 28/28 Thursday, 7 February 2013