Next-Generation Genome Sequencing
|
|
- Miles Bates
- 6 years ago
- Views:
Transcription
1 Next-Generation Genome Sequencing Jarkko Salojärvi, D.Sc. (Tech) Department of Biosciences, Division of Plant Biology Department of Veterinary Biosciences, Veterinary Microbiology and Epidemiology University of Helsinki
2 Topics Today Quick recap of relevant parts of technologies SOLiD and Solexa/Illlumina. Sequencing in Practice Properties of data How raw data will look like Re-sequencing Sequencing de novo Some example throughout the presentation...
3 Where it all began: Sanger sequencing Developed in Read length up to bases. Current platforms allow 96 concurrent reads. Still needed for closing gaps, sequencing long repeated fragments Frederick Sanger twice Nobel laureate
4 High-throughput sequencing technologies - summary
5 Throughput of different technologies Most recent versions. SOLiD 4 Solexa HiSeq 454 Titanium Runtime 5-10 days 8 days 10 h Read Length (bp) Raw sequence Gb /run Accuracy >99.94% >98.5% 99% (at 400 bp)
6 Sequencing in Practice
7 What do you get from sequencing service? Huge text files with usually two things being reported: 1.Base calls for each read 2.Read qualities In SOLiD these are in two separate files.csfasta and.qv.qual In Solexa, both are reported in the same.fastq file. At Biocenter Viikki: SOLiD 4 and 454 Titanium At FIMM Meilahti: Solexa Commercial services available, price thousand(s)/run.
8 Assembly task Given: A text file with lots of short reads, nucleotide sequences. Task: Align these, either with respect to each other (de novo) or a reference genome (re-sequencing). Essential: Coverage: Number of overlapping reads. Depth: Number of reads on a single nucleotide. Composition of reads (fragments/mate-pair) Is there a reference sequence of the organism?
9 Fragments vs. mate-pairs 1.Individual fragments. 2.Paired reads Mate-pair: genomic DNA is fragmented and size-selected inserts are circularized and linked by means of an internal adaptor. Paired end: Fragmentation of genomic DNA into short segments, followed by sequencing of both ends of the segment (but not the part in between). End result is the same: you know reads from both ends, plus the average distance between reads. A large fraction of short reads are difficult to map uniquely to the genome, and the second read of a pair can be used to find the correct location.
10 Mate-pair reads [Korbel et al.07] A Human genomic DNA i) Shearing and size selection ii) Protection and adapter ligation Bio Met Met Bio viii) Cutoff I Cutoff D iii) Circularization vi) Sequencing of >30 million paired ends with 454 technology Bio Bio Sequenced paired ends iv) Random Cleavage Bio v) Linker(+) read isolation Bio vii) Computational analysis and mapping of Structural Variants (SVs) Count Span of paired ends (i.e. distance between mapped ends [bp]) Result: A snapshot of the full genome in every ~3kbp -Can be used to align contigs from standard sequencing run.
11 Example: paired end mapping to reveal structural variation (SV) in human genome B Human reference genome Normally mapped End distance > cutoff D Altered end orientation Source, i.e. location paired ends originated from in sample genome Individual (sample) sequence Best-placement of end in human reference genome No SV Deletion Inversion breakpoint Span of paired ends in human reference genome End distance < cutoff I Insertion of sequence from distant locus Insertion of sequence from distant locus Region deleted from sample genome Region inserted in sample genome Human reference genome Individual (sample) sequence Region inverted in sample genome End that maps in inverted orientation relative to original (i.e. sample) locus Insertion, simple Insertion, mated Insertion, unmated [Korbel et al.07]
12 De novo vs. re-sequencing of genome In de novo, reads are assembled into contigs: Contiguous sequence of DNA created by assembling overlapping sequenced fragments of a chromosome. Reference assembly = re-sequencing. if the genome/template is known! reference assembly if the genome/template genome/tempale is unknown! de ference novo assembly Reference genome Gap in sequence coverage but reference genome tells that sequences are from the same contig or genomic region! tolerates short read length contig 1 contig 2 same original contig (for example mrna) may be splitted to multiple to shor contigs! longer reads provide more overlap for connecting individual reads
13 SOLiD raw data
14 SOLiD probes Probes designed for reading two nucleotides at a time. Four different colors. Resulting sequence in colorspace...but also CG,GC and TA are red?!
15 SOLiD sequence decoding Key to decoding: known last base of the adapter oligo sequence. Known: 0=A CA AC GT TG 0-1 CA AC GT TG 1-2 AA CC GG TT 2-3 A C A Petri Auvinen, DNA Sequencing and Genomi
16 SOLiD raw data files: raw reads in.csfasta SOLiD gives out in general two files, the reads in color space (.csfasta) and read qualities (.QV-qual) <filename>.csfasta Overall format: Last base of adapter oligo sequence+color space presented in numbers. 1 st Nucleotide 2 nd Nucleotide A C G T A C G T Example: >1_88_1830_R3 G >1_89_1562_R3 G & >$'&)#0(-$&+'& >TAG_ I D Co l o r _spa c e
17 SOLiD raw data files - qualities in QV.qual Quality values are in <filename>.qv.qual phred-like score for each read. score q=-10*log10(p) Example: ' >TAG_ID quality values >97_2040_1850_F >97_2040_1898_F ' p q
18 Benefit: Complementation in color space One benefit of color space is that it is self-complementing:! 2 nd Nucleotide A C G T A C st Nucleotide! F'6%'03'! Ba s e G T A G C T C G T C G T G C A G Co l o r spa c e D+71.'7'0-'5! Ba s e T C G A G C A G C A C G T C Co l o r spa c e
19 Downside One incorrect base can screw up the whole read in decoding CA AC GT TG 0-1 CA AC GT TG 1-2 AA CC GG TT 2-3 A C A A CA AC GT TG 0-1 TA GC CG AT 1-2 AA CC GG TT 2-3 A C G G In colorspace there is still only one error -> Alignment MUST be done in colorspace!
20 Solexa raw data
21 Solexa pipeline
22 Solexa output in fastq file Solexa raw data comes in one text file, default naming by flowcell lane and read direction example: s_7_1_sequence.txt Four lines per read: identifier 2.Raw sequence letters (A,T,C,G,N) 3.+same_sequence identifier 4.Read quality codes Phred-like.
23 Fastq quality scores Quality scores are reported in ASCII Saves disk space Example:
24 Re-sequencing
25 Requirements All alignment programs are designed in unix/linux platforms. Windows too slow. Written in C, some parts in Perl. Need a lot of memory: for human-sized genomes, at least 8Gb of RAM. Need a lot of disk space: data files now ~ 5 Gb. Take from tens of minutes to hours to complete. An account at CSC or some other computation facility recommended No graphical user interfaces Command line interface example: >assemble.pl reads_in.csfasta read_qualities_f3_qv.qual ref_file TAIR9_chr.fas -ref_type nt -NO_CORRECTION
26 Re-sequencing pipeline Proceeds in a similar manner for all platforms: 1. Create an index to be used for searching the reference genome. 2. Using the index, align reads to reference. 3. Form a consensus sequence. 4. Identify SNPs etc.
27 Short read alignment Because of huge amount of data, BLAST is too slow, and faster alignment methods have been developed. Faster methods use shortcuts based on indexing, where you search only a small part of the sequences. Hashing-based indexing. Burrows-Wheeler transform. Progress is rapid, methods published 2 years ago are now old.
28 Hashing-based aligners First generation of read aligners. Extend the idea of BLAST. Indexing: divide reads of length L into bins based on their first n nucleotides n is roughly 20 Alignment: For each position p in the reference genome: Reference sequence=pick next L nucleotides Find the appropriate bin Match the remaining reference sequence to reads in the bin Software: No gaps: Eland, Maq. Gaps allowed: Elandv2, SOAP, GenomeMapper (part of SHORE).
29 Burrows-Wheeler transform Next generation of sequence aligners Reference: ^GOOGOL Searches sequence matches using a prefix trie. Results in fast read alignment method Requires less memory Small gaps allowed Software: Bowtie (no gaps), BWA. SOAP2, 2-way BWT Task: Find match to LOL, given at most one mismatch
30 SHort Read Mapping Package = SHRiMP k-mer hashing step +very efficient implementation of the Smith-Waterman algorithm. Can be used for letter space and color space reads. Slower than the others, but gives optimal local alignment.
31 Performance comparison Homer N, Merriman B, Nelson SF (2009) BFAST: An Alignment Tool for Large Scale Genome Resequencing. PLoS ONE 4(11): e7767. doi: /journal.pone
32 Re-sequencing in colorspace Colorspace has its own pros and cons, which can be taken into account in sequence alignment. Translation into nucleotides as the last step after alignment! Use software that supports colorspace. In practice, this is just an option you give to the alignment program.
33 Using read qualities in alignment? Not all programs use them!! (check the manuals) Most new methods use read qualities. Out of the old ones: Maq. Lets look how the assembly goes with Maq... Has been used a lot in early papers. A benchmark for new methods regarding speed and alignment.
34 Maq - workflow maq fasta2bfa ref.fasta ref.bfa Convert the reference sequences to the binary fasta format maq fastq2bfq reads.fastq reads-1.bfq Convert the reads to the binary fastq format maq match reads-1.map ref.bfa reads-1.bfq Align the reads to the reference maq mapcheck ref.bfa reads-1.map >mapcheck.txt Statistics from the alignment maq assemble consensus.cns ref.bfa reads-1.map 2>assemble.log Build the mapping assembly maq cns2fq consensus.cns >cns.fq Extract consensus sequences and qualities maq cns2snp consensus.cns >cns.snp Extract list of SNPs
35 Maq - workflow
36 Aligning reads - analysing results Software for visualization: Maqviewer, SHOREmap, IGV, Tablet Visualization is VERY important! See the real quality of the data, alignment, coverage etc. Helps to identify errors Helps to evaluate SNP calls, identify gaps etc. There are wings, a propeller, and a pilot - it must be...
37 Maqviewer Only basic functionality.
38 Tablet viewer Coded in JAVA Graphical interface May be slow
39 SNP calling After aligning the short reads to reference genome, identify nucleotides that differ from reference. Make a consensus sequence of the reads Simplest: choose the most common one. Better: Use quality values in the voting Reference Consensus Individual reads SNP
40 SNP calling Software for SNP detection: SOAPsnp, Maq, probhd, SHOREmap, MUMmer Maq computes a phred-type score for the SNPs. SNPs hard to define, usually some thresholds given based on depth and number of differing nucleotides.??
41 Example: Sequencing of A.thaliana genome, mutations induced by EMS Two different mutants sequenced with two platforms: SOLiD + Maq: SNPs Solexa + Maq: SNPs Roughly 1 SNP per every 10,000 bases.
42 Further analysis Which SNP is responsible of the mutant phenotype? Usually some window of the genome is known. To identify the SNP, need to combine SNP locations and genome annotation: Is the SNP in a coding sequence,exon,intron, promoter, 3/5 UTR, junk? Is the SNP disruptive? Transclation results in stop codon/altered amino acid sequence? Each splice variant can be different, variants not known. Location in the protein? If done properly, would require protein structure prediction (very hard)
43 Sequencing de novo
44 How long reads does de novo genome assembly require? Key Problem: longer than read length repeats in the genome. Theoretical analysis: E.coli: 30 bp read length, 75% of genome is covered with contigs>10,000bp C.elegans: 50 bp read length, 51% of the genome is covered with contigs >10,000bp. Human: 50 bp read length, ~15% is covered with contigs>10,000 bp (chromosome 1). Re-sequencing and de novo sequencing of the majority of a bacterial genome is theoretically possible with read lengths of bp. With longer genomes significant proportions are left uncovered.
45 Percentage of the E.coli genome covered by contigs greater than a threshold length as a function of read length Whiteford, N. et al. Nucl. Acids Res :e171; doi: /nar/gni170
46 (b) (b) Read length l (nt) Read length l (nt) C. Elegans Human Use paired-end mapping to connect the longer contigs.
47 Genome assembly using paired end reads Figure 1: An illustration of the Paired End assembly process. Paired End reads are used to order and orient the contigs derived from the Newbler assembly. The large blue lines represent contigs generated from the whole genome shotgun sequencing and assembly. The multiple blue and grey lines represent Paired End information. The blue segments represent the two 20 nucleotide regions that were sequenced while the dotted grey line represents the distance between those two sequenced regions. [454 Technical note 1]
48 2010 Nature America ity score q20 (Supplementary Table 7). From these reads, ABySS, sequence could not be al SOAPdenovo and Velvet generated 6,535, 4,826 and 6,617 contigs estimated that M longer than 100 bp, respectively (Supplementary Fig. 18). On aver- alternative assemblies (S age, 64.6% of the contigs showed high sequence similarity (q90% identity) to a contig in each of the other assemblies (Fig. 4a). In addi- DISCUSSION tion, the SOAPdenovo assembly showed similarity to 90.7% of all The most abundant and assembly sequences (Fig. 4a). are SNVs. We compared Software: ALLPATHS, Edena, we Velvet and SOAPdenovo To analyze AbySS, the quality of these assemblers, designed PCR prim- three methods of SNP ers to amplify DNA fragments from the 186 randomly selected con- decision. Among the th Based on de Bruijn graphs. tigs (32 fragments that were 500 1,000 bp long and 30 fragments of at least 1.5 kb Fujimoto et al.: and carried from each software assembly) Illumina/solexa sequencing of human genome a b SOAPdenovo out PCR 200-bp amplification. Out of 186 contigs, 814 insert libraries, 12 runs. 181 were amplified with51-76 the proper length read length nt. (Supplementary 40xcoverage Fig. 19). We also validated these sequences through Sanger sequencing novothan assembly, of contigs: anddemore 90% comparison of them showed high 1,956 11,616 1,921 ABySS 6,535 to (violet) sequence identity (>90%) the predicted SOAPdenovo 4,826 (yellow) contig sequences. Software for de novo genome assembly Velvet 6,617 (green). ABySS Contigs that were aligned with more than 90% Velvet Figure 4 De novo assembly of unmapped identity were considered shared contigs 955 reads. (a) Comparison of contigs generated by ABySS (violet), SOAPdenovo (yellow) and c d Hs Alt Velvet (green). Contigs that were aligned with Hs GRCh37 more than 90% identity were considered shared Fujimoto et al. (2010) Whole-genome sequencing and comprehensivehs variant analysis of other contigs. (b) Identification of contigs by ABySS a Japanese individual using massively parallel sequencing. Nature Genetics 42, Chimpanzee showing the proportion of the total length
49 Velvet assembly - de Bruijn graphs Split reads into k-mers. Align all k-mers in the reads (here 5- mers) de Bruijn graph: Each node represents a series of overlapping k-mers Final nucleotides make up the sequence of the node. Last k-mer of an arc s origin overlaps with the first of its destination. Reads are mapped as paths through the graph D. R. Zerbino and E. Birney (2008) Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18:
50 Example: de novo fragment assembly of one SOLiD run of A.Thaliana Data 68,073,401 reads. Read length 50 nt 12 x coverage. Assembly At least 3xcoverage: 29 M reads. Number of contigs: 295,203 Median contig length: 276 nt Longest contig: 3399 nt Shortest contig: 82 nt Sum contig length : 98,457,405 ~62.7% of the genome. contig length (nt) Index
51 SNP analysis of de novo vs. re-sequencing The sequenced genome was A.Thaliana, Cvi ecotype In de novo, velvet+mummer: 2,371,409 SNPs Re-sequencing, TAIR9+Maq: 183,811 SNPs Published SNP list: 810,205 SNPs Warning: Indices of the published list of Cvi SNPs do not match TAIR9 reference genome!! Consensus sequence of Cvi is not published ( released )
52 Example: Sequencing of the giant panda genome Li et al. (2010) The sequence and de novo assembly of the giant panda genome. Nature 463,
53 Genome assembly using paired end reads v2 Several libraries with different insert lengths Can use the same sequencing technology for whole assembly. Strategy: Join reads with short insert lengths into contigs Make into scaffolds by mapping unpaired ends to other contigs Scaffold = set of contigs with spaces between. Use longer insert libraries for arranging contigs
54 Sequencing setup for Giant Panda 37 paired-end insert libraries with insert sizes of 150 bp, 500 bp, 2 kb, 5 kb and 10 k. Illumina Genome Analyser platform. 176 Gb of usable sequence, 73x coverage. Average read length of 52 bp.
55 Summary of Assembly Final contig size 2.24 Gb Estimated genome size 2.40 Gb.
56 Genome annotation de novo Gene finding: Align known genes of model species against the new genome. Hidden Markov Model-based prediction of genes: Genscan, Augustus, HMMgene Gene annotation: Function of the genes that can be aligned to new genome give some clue. Gene orthologues, InParanoid, Multiparanoid.
57 Structure of the umami receptor T1R1 gene Heterodimer T1R1/T1R3 may be the sole receptor for umami taste. Umami: detection of the carboxylate anion of glutamic acid, a naturally occurring amino acid common in meats, cheese, broth, stock and other protein-heavy foods. In panda T1R1 is a pseudogene. Recent mutation, may explain the diet?
58 Example application: nucleosome positioning
59 Chromatin structure Chromatin=combination of DNA, RNA, and protein that makes up chromosomes. Functions: Package DNA, strengthen the DNA to allow mitosis and meiosis Serves as a mechanism to control expression and DNA replication. Changes in chromatin structure are affected by chemical modifications of histone proteins such as methylation (DNA and proteins) and acetylation (proteins), and by non-histone, DNA-binding proteins.
60 Predicting nucleosome positions Separate DNA into nucleosome vs. linker DNA parts. Sequence these with 454. Nucleosome ~146 bp, linker DNA ~ bp. Construct a model to predict nucleosome positions. [Field et al. 08]
61 Computational model Nucleosomes: estimate a (position-specific) di-nucleotide model PN over all nucleotide sequences. Linker DNA: Estimate 5-mer model PL for linker DNA vs. nucleosome. ScoreðSÞ~log P NðSÞ P L ðsþ P N,1 ðs½1šþ 147 P P N,iðS½iŠjS½i{1ŠÞ i~2 ~log P l S½1Š 147 P P lðs½išjs½maxð1,i{4þš,...,s½i{1šþ i~2 Estimate score for whole DNA, taking into account all legal configurations of nucleosome positioning. Normalize to get probabilities PðW c ½SŠÞ~ W c½sš P W c ½SŠ, c [C
62 Result Nucleosome localization can be predicted from DNA sequence. Two different types of regulation by chromatin in yeast promoters: Nucleosome-depeleted areas: genes showing relatively low cell-to-cell expression variability, or transcriptional noise. Nucleosome-rich areas: Transcription factors need to compete with nucleosomes for access to the DNA => variability in gene expression.
63 Further uses for high-throughput sequencing? Cataloging sequences and their variation: Between individuals and species. SNPs, quantitative trait loci. Copy number variations. Mutations and genome rearrangements. Metagenomics. Evolution at an individual level. Phylogeny Epigenetics DNA methylation (using ChIP-seq). Chromatin structure. Transcriptome Digital Gene Expression. ChIP-seq. Splice variants. microrna. Cell-specific gene expression.
64 What can high-throughput sequencing do for you? [Kahvejian et al. 08]
65 References Li, H, Homer, N. (2010) A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics 11(5): Fujimoto et al. (2010) Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing. Nature Genetics 42, Li et al. (2010) The sequence and de novo assembly of the giant panda genome. Nature 463, Magi A. et al. (2010) Bioinformatics for Next Generation Sequencing Data. Genes 1: Vera, J.C., Wheat, C.W., Fescemyer, H.W., Frilander, M.J., Crawford, D.L., Hanski, I., and Marden, J.H. (2008) Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Molecular Ecology 17: Field Y, Kaplan N, Fondufe-Mittendorf Y, Moore IK, Sharon E, et al. (2008) Distinct Modes of Regulation by Chromatin Encoded through Nucleosome Positioning Signals. PLoS Comput Biol 4(11): e doi: / journal.pcbi Kahvejian A., Quackenbush J., Thompson J.F. (2008) What would you do if you could sequence everything? Nature Biotechnology 26(10): D. R. Zerbino and E. Birney (2008) Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18: Korbel et al. (2007) Paired-end mapping reveals extensive structural variation in the human genome, Science 318: Whole Genome Assembly using Paired End Reads in E. coli, B. licheniformis, and S. cerevisiae. 454 Application note 1, Whiteford, N. et al. (2005) An analysis of the feasibility of short read sequencing. Nucleic Acids Res. 33, e171.
High-Throughput Bioinformatics: Re-sequencing and de novo assembly. Elena Czeizler
High-Throughput Bioinformatics: Re-sequencing and de novo assembly Elena Czeizler 13.11.2015 Sequencing data Current sequencing technologies produce large amounts of data: short reads The outputted sequences
More informationNEXT GENERATION SEQUENCING. Farhat Habib
NEXT GENERATION SEQUENCING HISTORY HISTORY Sanger Dominant for last ~30 years 1000bp longest read Based on primers so not good for repetitive or SNPs sites HISTORY Sanger Dominant for last ~30 years 1000bp
More informationDe Novo Assembly of High-throughput Short Read Sequences
De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,
More informationNext Generation Sequencing. Tobias Österlund
Next Generation Sequencing Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45
More informationMate-pair library data improves genome assembly
De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate
More informationSequence Assembly and Alignment. Jim Noonan Department of Genetics
Sequence Assembly and Alignment Jim Noonan Department of Genetics james.noonan@yale.edu www.yale.edu/noonanlab The assembly problem >>10 9 sequencing reads 36 bp - 1 kb 3 Gb Outline Basic concepts in genome
More informationAlignment methods. Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics
Alignment methods Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform
More informationIllumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme
Illumina (Solexa) Current market leader Based on sequencing by synthesis Current read length 100-150bp Paired-end easy, longer matepairs harder Error ~0.1% Mismatch errors dominate Throughput: 4 Tbp in
More informationContact us for more information and a quotation
GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA
More informationDe novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club
De novo assembly of human genomes with massively parallel short read sequencing Mikk Eelmets Journal Club 06.04.2010 Problem DNA sequencing technologies: Sanger sequencing (500-1000 bp) Next-generation
More informationLecture 7. Next-generation sequencing technologies
Lecture 7 Next-generation sequencing technologies Next-generation sequencing technologies General principles of short-read NGS Construct a library of fragments Generate clonal template populations Massively
More informationVariation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI
Variation detection based on second generation sequencing data Xin LIU Department of Science and Technology, BGI liuxin@genomics.org.cn 2013.11.21 Outline Summary of sequencing techniques Data quality
More informationBST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1
BST 226 Statistical Methods for Bioinformatics David M. Rocke March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1 NGS Technologies Illumina Sequencing HiSeq 2500 & MiSeq PacBio Sequencing PacBio
More informationChIP-seq and RNA-seq
ChIP-seq and RNA-seq Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions (ChIPchromatin immunoprecipitation)
More informationIntroduction to Next Generation Sequencing
The Sequencing Revolution Introduction to Next Generation Sequencing Dena Leshkowitz,WIS 1 st BIOmics Workshop High throughput Short Read Sequencing Technologies Highly parallel reactions (millions to
More informationNOW GENERATION SEQUENCING. Monday, December 5, 11
NOW GENERATION SEQUENCING 1 SEQUENCING TIMELINE 1953: Structure of DNA 1975: Sanger method for sequencing 1985: Human Genome Sequencing Project begins 1990s: Clinical sequencing begins 1998: NHGRI $1000
More informationC3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère
C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/
More informationChIP-seq and RNA-seq. Farhat Habib
ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions
More informationBioinformatics in next generation sequencing projects
Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet May 2013 Standard sequence library generation Illumina
More informationMatthew Tinning Australian Genome Research Facility. July 2012
Next-Generation Sequencing: an overview of technologies and applications Matthew Tinning Australian Genome Research Facility July 2012 History of Sequencing Where have we been? 1869 Discovery of DNA 1909
More informationData Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis
Data Basics Josef K Vogt Slides by: Simon Rasmussen 2017 Generalized NGS analysis Sample prep & Sequencing Data size Main data reductive steps SNPs, genes, regions Application Assembly: Compare Raw Pre-
More informationNext-Generation Sequencing. Technologies
Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062
More informationIntroduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013
Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein Abundance
More informationSequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio
More informationTranscriptomics analysis with RNA seq: an overview Frederik Coppens
Transcriptomics analysis with RNA seq: an overview Frederik Coppens Platforms Applications Analysis Quantification RNA content Platforms Platforms Short (few hundred bases) Long reads (multiple kilobases)
More informationIntroduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014
Introduction to metagenome assembly Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014 Sequencing specs* Method Read length Accuracy Million reads Time Cost per M 454
More informationSequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio
More informationNGS part 2: applications. Tobias Österlund
NGS part 2: applications Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45
More informationMapping strategies for sequence reads
Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements
More informationHigh throughput sequencing technologies
High throughput sequencing technologies and NGS applications Mei-yeh Lu 呂美曄 High Throughput Sequencing Core Manager g g p q g g Academia Sinica 6/30/2011 Outlines Evolution of sequencing technologies Sanger
More informationAaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop
Output (bp) Aaron Liston, Oregon State University Growth in Next-Gen Sequencing Capacity 3.5E+11 2002 2004 2006 2008 2010 3.0E+11 2.5E+11 2.0E+11 1.5E+11 1.0E+11 Adapted from Mardis, 2011, Nature 5.0E+10
More informationAssembling a Cassava Transcriptome using Galaxy on a High Performance Computing Cluster
Assembling a Cassava Transcriptome using Galaxy on a High Performance Computing Cluster Aobakwe Matshidiso Supervisor: Prof Chrissie Rey Co-Supervisor: Prof Scott Hazelhurst Next Generation Sequencing
More informationSequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio
More informationParts of a standard FastQC report
FastQC FastQC, written by Simon Andrews of Babraham Bioinformatics, is a very popular tool used to provide an overview of basic quality control metrics for raw next generation sequencing data. There are
More informationIntroduction to Bioinformatics and Gene Expression Technologies
Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a
More informationIntroduction to Bioinformatics and Gene Expression Technologies
Vocabulary Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 Gene: Genetics: Genome: Genomics: hereditary
More informationGenomic resources. for non-model systems
Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing
More informationCompute- and Data-Intensive Analyses in Bioinformatics"
Compute- and Data-Intensive Analyses in Bioinformatics" Wayne Pfeiffer SDSC/UCSD August 8, 2012 Questions for today" How big is the flood of data from high-throughput DNA sequencers? What bioinformatics
More informationL3: Short Read Alignment to a Reference Genome
L3: Short Read Alignment to a Reference Genome Shamith Samarajiwa CRUK Autumn School in Bioinformatics Cambridge, September 2017 Where to get help! http://seqanswers.com http://www.biostars.org http://www.bioconductor.org/help/mailing-list
More informationGenome 373: Mapping Short Sequence Reads II. Doug Fowler
Genome 373: Mapping Short Sequence Reads II Doug Fowler The final Will be in this room on June 6 th at 8:30a Will be focused on the second half of the course, but will include material from the first half
More informationShort Read Alignment to a Reference Genome
Short Read Alignment to a Reference Genome Shamith Samarajiwa CRUK Summer School in Bioinformatics Cambridge, September 2018 Aligning to a reference genome BWA Bowtie2 STAR GEM Pseudo Aligners for RNA-seq
More informationGenomics and Transcriptomics of Spirodela polyrhiza
Genomics and Transcriptomics of Spirodela polyrhiza Doug Bryant Bioinformatics Core Facility & Todd Mockler Group, Donald Danforth Plant Science Center Desired Outcomes High-quality genomic reference sequence
More informationIntroductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology
Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie Sander van Boheemen Medical Microbiology Next-generation sequencing Next-generation sequencing (NGS), also known as
More informationsolid S Y S T E M s e q u e n c i n g See the Difference Discover the Quality Genome
solid S Y S T E M s e q u e n c i n g See the Difference Discover the Quality Genome See the Difference With a commitment to your peace of mind, Life Technologies provides a portfolio of robust and scalable
More informationRNA-Sequencing analysis
RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges
More informationIntroductory Next Gen Workshop
Introductory Next Gen Workshop http://www.illumina.ucr.edu/ http://www.genomics.ucr.edu/ Workshop Objectives Workshop aimed at those who are new to Illumina sequencing and will provide: - a basic overview
More informationNext Gen Sequencing. Expansion of sequencing technology. Contents
Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND
More informationGENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 21 2011, pages 2957 2963 doi:10.1093/bioinformatics/btr507 Genome analysis Advance Access publication September 7, 2011 : fast length adjustment of short reads
More informationAnalysis of RNA-seq Data
Analysis of RNA-seq Data A physicist and an engineer are in a hot-air balloon. Soon, they find themselves lost in a canyon somewhere. They yell out for help: "Helllloooooo! Where are we?" 15 minutes later,
More informationNext Generation Sequencing: An Overview
Next Generation Sequencing: An Overview Cavan Reilly November 13, 2017 Table of contents Next generation sequencing NGS and microarrays Study design Quality assessment Burrows Wheeler transform Next generation
More informationStructural variation analysis using NGS sequencing
Structural variation analysis using NGS sequencing Victor Guryev NBIC NGS taskforce meeting April 15th, 2011 Scale of genomic variants Scale 1 bp 10 bp 100 bp 1 kb 10 kb 100 kb 1 Mb Variants SNPs Short
More informationMachine Learning. HMM applications in computational biology
10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly
More informationDe novo assembly in RNA-seq analysis.
De novo assembly in RNA-seq analysis. Joachim Bargsten Wageningen UR/PRI/Plant Breeding October 2012 Motivation Transcriptome sequencing (RNA-seq) Gene expression / differential expression Reconstruct
More informationEucalyptus gene assembly
Eucalyptus gene assembly ACGT Plant Biotechnology meeting Charles Hefer Bioinformatics and Computational Biology Unit University of Pretoria October 2011 About Eucalyptus Most valuable and widely planted
More informationThe Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica
The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database
More informationA shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter
A shotgun introduction to sequence assembly (with Velvet) MCB 247 - Brem, Eisen and Pachter Hot off the press January 27, 2009 06:00 AM Eastern Time llumina Launches Suite of Next-Generation Sequencing
More informationDifferential gene expression analysis using RNA-seq
https://abc.med.cornell.edu/ Differential gene expression analysis using RNA-seq Applied Bioinformatics Core, March 2018 Friederike Dündar with Luce Skrabanek & Paul Zumbo Day 1: Introduction into high-throughput
More information02 Agenda Item 03 Agenda Item
01 Agenda Item 02 Agenda Item 03 Agenda Item SOLiD 3 System: Applications Overview April 12th, 2010 Jennifer Stover Field Application Specialist - SOLiD Applications Workflow for SOLiD Application Application
More informationDeep Sequencing technologies
Deep Sequencing technologies Gabriela Salinas 30 October 2017 Transcriptome and Genome Analysis Laboratory http://www.uni-bc.gwdg.de/index.php?id=709 Microarray and Deep-Sequencing Core Facility University
More informationSupplementary Materials for De-novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity
Supplementary Materials for De-novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity Sections: S1. Evaluation of transcriptome assembly completeness S2. Comparison
More informationNext-generation sequencing and quality control: An introduction 2016
Next-generation sequencing and quality control: An introduction 2016 s.schmeier@massey.ac.nz http://sschmeier.com/bioinf-workshop/ Overview Typical workflow of a genomics experiment Genome versus transcriptome
More informationSupplementary Figures
Supplementary Figures A B Supplementary Figure 1. Examples of discrepancies in predicted and validated breakpoint coordinates. A) Most frequently, predicted breakpoints were shifted relative to those derived
More informationMapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010
Mapping Next Generation Sequence Reads Bingbing Yuan Dec. 2, 2010 1 What happen if reads are not mapped properly? Some data won t be used, thus fewer reads would be aligned. Reads are mapped to the wrong
More informationAnalysing genomes and transcriptomes using Illumina sequencing
Analysing genomes and transcriptomes using Illumina uencing Dr. Heinz Himmelbauer Centre for Genomic Regulation (CRG) Ultrauencing Unit Barcelona The Sequencing Revolution High-Throughput Sequencing 2000
More informationGap Filling for a Human MHC Haplotype Sequence
American Journal of Life Sciences 2016; 4(6): 146-151 http://www.sciencepublishinggroup.com/j/ajls doi: 10.11648/j.ajls.20160406.12 ISSN: 2328-5702 (Print); ISSN: 2328-5737 (Online) Gap Filling for a Human
More informationChIP-seq analysis 2/28/2018
ChIP-seq analysis 2/28/2018 Acknowledgements Much of the content of this lecture is from: Furey (2012) ChIP-seq and beyond Park (2009) ChIP-seq advantages + challenges Landt et al. (2012) ChIP-seq guidelines
More informationNext Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park
Next Generation Sequences & Chloroplast Assembly 8 June, 2012 Jongsun Park Table of Contents 1 History of Sequencing Technologies 2 Genome Assembly Processes With NGS Sequences 3 How to Assembly Chloroplast
More informationWelcome to the NGS webinar series
Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic
More informationDe novo genome assembly with next generation sequencing data!! "
De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature
More informationOutline. The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing
Illumina Assembly 1 Outline The types of Illumina data Methods of assembly Repeats Selecting k-mer size Assembly Tools Assembly Diagnostics Assembly Polishing 2 Illumina Sequencing Paired end Illumina
More informationNGS in Pathology Webinar
NGS in Pathology Webinar NGS Data Analysis March 10 2016 1 Topics for today s presentation 2 Introduction Next Generation Sequencing (NGS) is becoming a common and versatile tool for biological and medical
More informationAssemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz
Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Table of Contents Supplementary Note 1: Unique Anchor Filtering Supplementary Figure
More informationExperimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis
-Seq Analysis Quality Control checks Reproducibility Reliability -seq vs Microarray Higher sensitivity and dynamic range Lower technical variation Available for all species Novel transcript identification
More informationSupplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line
Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line Table of Contents SUPPLEMENTARY TEXT:... 2 FILTERING OF RAW READS PRIOR TO ASSEMBLY:... 2 COMPARATIVE ANALYSIS... 2 IMMUNOGENIC
More informationTruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR)
tru TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR) Anton Bankevich Center for Algorithmic Biotechnology, SPbSU Sequencing costs 1. Sequencing costs do not follow Moore s law
More informationde novo Transcriptome Assembly Nicole Cloonan 1 st July 2013, Winter School, UQ
de novo Transcriptome Assembly Nicole Cloonan 1 st July 2013, Winter School, UQ de novo transcriptome assembly de novo from the Latin expression meaning from the beginning In bioinformatics, we often use
More informationBiol 478/595 Intro to Bioinformatics
Biol 478/595 Intro to Bioinformatics September M 1 Labor Day 4 W 3 MG Database Searching Ch. 6 5 F 5 MG Database Searching Hw1 6 M 8 MG Scoring Matrices Ch 3 and Ch 4 7 W 10 MG Pairwise Alignment 8 F 12
More informationSCIENCE CHINA Life Sciences. Comparative analysis of de novo transcriptome assembly
SCIENCE CHINA Life Sciences SPECIAL TOPIC February 2013 Vol.56 No.2: 156 162 RESEARCH PAPER doi: 10.1007/s11427-013-4444-x Comparative analysis of de novo transcriptome assembly CLARKE Kaitlin 1, YANG
More informationCOMPUTER RESOURCES II:
COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer
More informationHigh Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center
High Throughput Sequencing the Multi-Tool of Life Sciences Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center Complementary Approaches Illumina Still-imaging of clusters (~1000
More informationIntroduction to the MiSeq
Introduction to the MiSeq 2011 Illumina, Inc. All rights reserved. Illumina, illuminadx, BeadArray, BeadXpress, cbot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate,
More informationGenomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics
Genomic Technologies Michael Schatz Feb 1, 2018 Lecture 2: Applied Comparative Genomics Welcome! The primary goal of the course is for students to be grounded in theory and leave the course empowered to
More informationBioinformatics Course AA 2017/2018 Tutorial 2
UNIVERSITÀ DEGLI STUDI DI PAVIA - FACOLTÀ DI SCIENZE MM.FF.NN. - LM MOLECULAR BIOLOGY AND GENETICS Bioinformatics Course AA 2017/2018 Tutorial 2 Anna Maria Floriano annamaria.floriano01@universitadipavia.it
More informationIntroduction to RNA sequencing
Introduction to RNA sequencing Bioinformatics perspective Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden November 2017 Olga (NBIS) RNA-seq November 2017 1 / 49 Outline Why sequence
More informationDe novo meta-assembly of ultra-deep sequencing data
De novo meta-assembly of ultra-deep sequencing data Hamid Mirebrahim 1, Timothy J. Close 2 and Stefano Lonardi 1 1 Department of Computer Science and Engineering 2 Department of Botany and Plant Sciences
More informationDe novo whole genome assembly
De novo whole genome assembly Qi Sun Bioinformatics Facility Cornell University Sequencing platforms Short reads: o Illumina (150 bp, up to 300 bp) Long reads (>10kb): o PacBio SMRT; o Oxford Nanopore
More informationAnalysis of Biological Sequences SPH
Analysis of Biological Sequences SPH 140.638 swheelan@jhmi.edu nuts and bolts meet Tuesdays & Thursdays, 3:30-4:50 no exam; grade derived from 3-4 homework assignments plus a final project (open book,
More informationThe Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow
The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow Marcus Hausch, Ph.D. 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense Out of Life, Oligator,
More informationGenome 373: High- Throughput DNA Sequencing. Doug Fowler
Genome 373: High- Throughput DNA Sequencing Doug Fowler Tasks give ML unity We learned about three tasks that are commonly encountered in ML Models/Algorithms Give ML Diversity Classification Regression
More informationDNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.
DNA Preparation and QC Extraction DNA was extracted from whole blood or flash frozen post-mortem tissue using a DNA mini kit (QIAmp #51104 and QIAmp#51404, respectively) following the manufacturer s recommendations.
More informationde novo paired-end short reads assembly
1/54 de novo paired-end short reads assembly Rayan Chikhi ENS Cachan Brittany Symbiose, Irisa, France 2/54 THESIS FOCUS Graph theory for assembly models Indexing large sequencing datasets Practical implementation
More informationModern Epigenomics. Histone Code
Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome Sciences and Systems Biology Washington University Dragon Star 2012 Changchun, China July 2, 2012 DNA methylation + Histone
More informationIllumina s Suite of Targeted Resequencing Solutions
Illumina s Suite of Targeted Resequencing Solutions Colin Baron Sr. Product Manager Sequencing Applications 2011 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense Out of Life,
More informationSNP calling and VCF format
SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide
More informationThe New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing
The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before Jeremy Preston, PhD Marketing Manager, Sequencing Illumina Genome Analyzer: a Paradigm Shift 2000x gain in efficiency
More informationAuthors: Vivek Sharma and Ram Kunwar
Molecular markers types and applications A genetic marker is a gene or known DNA sequence on a chromosome that can be used to identify individuals or species. Why we need Molecular Markers There will be
More informationIntroduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012
Introduction to transcriptome analysis using High Throughput Sequencing technologies D. Puthier 2012 A typical RNA-Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
More informationRNA-Seq data analysis course September 7-9, 2015
RNA-Seq data analysis course September 7-9, 2015 Peter-Bram t Hoen (LUMC) Jan Oosting (LUMC) Celia van Gelder, Jacintha Valk (BioSB) Anita Remmelzwaal (LUMC) Expression profiling DNA mrna protein Comprehensive
More informationBST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data
BST227 Introduction to Statistical Genetics Lecture 8: Variant calling from high-throughput sequencing data 1 PC recap typical genome Differs from the reference genome at 4-5 million sites ~85% SNPs ~15%
More information