Alignment methods Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics
Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software Metagenomics course 1/28 Thursday, 7 February 2013
Sequence alignment Identifying regions of similarity in sequences Metagenomics course 2/28 Thursday, 7 February 2013
Sequence alignment Identifying regions of similarity in sequences In NGS Recovering original nucleotide sequence... from many short fragments... using a known reference Metagenomics course 2/28 Thursday, 7 February 2013
Sequence alignment Pairwise alignment Metagenomics course 3/28 Thursday, 7 February 2013
Sequence alignment Multiple sequence alignment Metagenomics course 4/28 Thursday, 7 February 2013
Sequence alignment Global vs local alignment Metagenomics course 5/28 Thursday, 7 February 2013
Sequence alignment Structural alignment Metagenomics course 6/28 Thursday, 7 February 2013
Assembly vs alignment Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software Metagenomics course 7/28 Thursday, 7 February 2013
Assembly vs alignment Assembly Metagenomics course 8/28 Thursday, 7 February 2013
Assembly vs alignment Assembly Alignment Metagenomics course 8/28 Thursday, 7 February 2013
Assembly vs alignment Assembly Memory hungry Needs high coverage Metagenomics course 9/28 Thursday, 7 February 2013
Assembly vs alignment Assembly Memory hungry Needs high coverage Alignment Easy to do in parallel Restricted by reference sequence highly polymorphic regions large insertions Metagenomics course 9/28 Thursday, 7 February 2013
Alignment methods Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software Metagenomics course 10/28 Thursday, 7 February 2013
Alignment methods Smith-Waterman Generalization of Needleman-Wunsch Guaranteed optimal alignment A C A C A C T A 0 0 0 0 0 0 0 0 0 A 0 2 1 2 1 2 1 0 2 G 0 1 1 1 1 1 1 0 1 C 0 0 3 2 3 2 3 2 1 A 0 2 2 5 4 5 4 3 4 C 0 1 4 4 7 6 7 6 5 A 0 2 3 6 6 9 8 7 8 C 0 1 4 5 8 8 11 10 9 A 0 2 3 6 7 10 10 10 12 gap penalty = 1 match=+2 mismatch= 1 Metagenomics course 11/28 Thursday, 7 February 2013
Alignment methods 2-step alignment Metagenomics course 12/28 Thursday, 7 February 2013
Alignment methods 2-step alignment Step 1: Find candidate positions Use read seeds Hash table-based or Burrows-Wheeler transform-based heuristic Balance between speed and accuracy Metagenomics course 12/28 Thursday, 7 February 2013
Alignment methods 2-step alignment Step 2: Align and report Complete alignment with Smith-Waterman Evaluate alignment(s) Metagenomics course 12/28 Thursday, 7 February 2013
Common issues Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software Metagenomics course 13/28 Thursday, 7 February 2013
Common issues Insertions and deletions (indels) Metagenomics course 14/28 Thursday, 7 February 2013
Common issues Insertions and deletions (indels) Local realignment around indels Per-Base Alignment Qualities (BAQ) Metagenomics course 14/28 Thursday, 7 February 2013
Common issues Non-unique alignment How to report non-unique alignments? Metagenomics course 15/28 Thursday, 7 February 2013
Common issues Non-unique alignment How to report non-unique alignments? Discard entirely Choose one randomly Report all with best quality above some quality Depends on the tool Metagenomics course 15/28 Thursday, 7 February 2013
Common issues Structural variation Chromosomal relocation Inversion Large indels Copy-number variation Use specialized tools Metagenomics course 16/28 Thursday, 7 February 2013
Common issues Split-read mapping Allow aligned read to be split For example RNA reads on DNA reference Metagenomics course 17/28 Thursday, 7 February 2013
Common issues Split-read mapping Allow aligned read to be split For example RNA reads on DNA reference Metagenomics course 17/28 Thursday, 7 February 2013
Common issues Circular alignment Circular genome (e.g. bacteria, mitochondria) Metagenomics course 18/28 Thursday, 7 February 2013
Common issues Circular alignment Circular genome (e.g. bacteria, mitochondria) Most aligners assume linear reference Metagenomics course 18/28 Thursday, 7 February 2013
Common issues Circular alignment Circular genome (e.g. bacteria, mitochondria) Most aligners assume linear reference Trick: extend reference Metagenomics course 18/28 Thursday, 7 February 2013
Common issues Circular alignment Circular genome (e.g. bacteria, mitochondria) Most aligners assume linear reference Trick: extend reference copy first N bases to the end Metagenomics course 18/28 Thursday, 7 February 2013
Common issues Circular alignment Circular genome (e.g. bacteria, mitochondria) Most aligners assume linear reference Trick: extend reference copy first N bases to the end restore alignment to original reference Metagenomics course 18/28 Thursday, 7 February 2013
Platform specifics Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software Metagenomics course 19/28 Thursday, 7 February 2013
Platform specifics Paired-end sequencing Metagenomics course 20/28 Thursday, 7 February 2013
Platform specifics Paired-end sequencing Align reads separately Choose from non-unique alignments based on pairing Metagenomics course 20/28 Thursday, 7 February 2013
Platform specifics Color-space (or SOLiD) reads Used by 454, Solexa, SOLiD systems Di-nucleotide encoding Needs support from alignment software Metagenomics course 21/28 Thursday, 7 February 2013
Platform specifics Color-space (or SOLiD) reads Used by 454, Solexa, SOLiD systems Di-nucleotide encoding Needs support from alignment software Metagenomics course 21/28 Thursday, 7 February 2013
Platform specifics Color-space (or SOLiD) reads Decoding Metagenomics course 22/28 Thursday, 7 February 2013
Error profile Platform specifics Homopolymers CG-content Positional (example shown) Metagenomics course 23/28 Thursday, 7 February 2013
Software Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform specifics Software Metagenomics course 24/28 Thursday, 7 February 2013
Software Some popular aligners for NGS Hash table-based Eland MAQ Metagenomics course 25/28 Thursday, 7 February 2013
Software Some popular aligners for NGS Hash table-based Eland MAQ Burrows-Wheeler Transform-based Bowtie BWA Metagenomics course 25/28 Thursday, 7 February 2013
Software Some popular aligners for NGS Hash table-based Eland MAQ Burrows-Wheeler Transform-based Bowtie BWA Split-read alignment Tophat GSNAP Mosaik Metagenomics course 25/28 Thursday, 7 February 2013
Viewers Software IGV, Savant, Geneyous, Tablet Metagenomics course 26/28 Thursday, 7 February 2013
Viewers Software IGV, Savant, Geneyous, Tablet tview (console-based) Metagenomics course 26/28 Thursday, 7 February 2013
Viewers Software IGV, Savant, Geneyous, Tablet tview (console-based) UCSC Genome Browser, GBrowse (web-based) Metagenomics course 26/28 Thursday, 7 February 2013
Questions? Acknowledgements: Jeroen Laros Bas E. Dutilh Metagenomics course 27/28 Thursday, 7 February 2013
Questions? Image sources cbsu.tc.cornell.edu/ngw2010/day2 lecture1.pdf en.wikipedia.org/wiki/sequence alignment en.wikipedia.org/wiki/multiple sequence alignment www.pitt.edu/ mcs2/teaching/biocomp/tutorials/global.html www.biology-direct.com/content/4/1/30/figure/f3?highres=y www.genomesunzipped.org/2012/04/guest-post-accurate-identification-of-rna-editing-sites-from-high -throughput-sequencing-data.php www.eplantscience.com/botanical biotechnology biology chemistry/biotechnology/genes genetic engineering/genes nature concept and synthesis/biotech physical nature dna.php www.pnas.org/content/109/4/1347/f1.expansion.html omega.rc.unesp.br/mauricio/curso/bibliografia/22/362/dibase%20sequencing%20and%20color%20space %20Analysis.pdf cgrlucb.wikispaces.com/samtoolsspring2012 and some of my own Metagenomics course 28/28 Thursday, 7 February 2013