High throughput biological sequence collection, analysis and its application in analysis of oral microbiota

Size: px
Start display at page:

Download "High throughput biological sequence collection, analysis and its application in analysis of oral microbiota"

Transcription

1 High throughput biological sequence collection, analysis and its application in analysis of oral microbiota Chaochun Wei ( 韦朝春 ) Department of Bioinformatics and Biostatistics Shanghai Jiao Tong University Fall 2013

2 Contents Background A brief history of genomics technology developments Introduction to metagenomics Oral metagenomics Summary

3 The ultimate goal is for sequencing to become so simple and inexpensive that it can be routinely deployed as a general-purpose tool throughout biomedicine., Research applications will include characterizing genomes, epigenomes and transcriptomes of humans and other species, as well as using sequencing as a proxy to probe diverse molecular interactions. Eric S. Lander, 2011, Initial impact of the sequencing of the human genome Nature

4 Genomics Technology developments

5 Milestone of Genomics Technology Affy launches Gene Expression microarrays First microarray publication - on Arabidopsis Affy & ILMN both launched 100K genotyping arrays The Sequencing Shake up!! ABI commercializes first automated DNA sequencer Hapmap project launched ILMN launches gene expression arrays ILMN bought Solexa; launches GA Roche GS FLX launched ILMN HiSeq 2000 launched In the coming future Low hanging fruit: cystic fibrosis mutation identified Rise of Genbank databases from DNA sequencing 3700 DNA Analyzer in Human Genome Project; DNA sequencing goes industrial Human Genome Project & Celera Genomics completes first draft genome Hapmap 1 st phase data release ABI SOLiD 1.0 Launched! Rise of Genome Wide Association Studies (GWAS) SOLiD 3.0: 100GB out of the box! The 3 rd Generation Sequencing will be launched

6 Milestone of Genomics Technology Affy launches Gene Expression microarrays First microarray publication - on Arabidopsis Affy & ILMN both launched 100K genotyping arrays The Sequencing Shake up!! ABI commercializes first automated DNA sequencer Hapmap project launched ILMN launches gene expression arrays ILMN bought Solexa; launches GA Roche GS FLX launched ILMN HiSeq 2000 launched In the coming future Low hanging fruit: cystic fibrosis mutation identified Rise of Genbank databases from DNA sequencing 3700 DNA Analyzer in Human Genome Project; DNA sequencing goes industrial Human Genome Project & Celera Genomics completes first draft genome Hapmap 1 st phase data release ABI SOLiD 1.0 Launched! Rise of Genome Wide Association Studies (GWAS) SOLiD 3.0: 100GB out of the box! The 3 rd Generation Sequencing will be launched

7 Milestone of Genomics Technology ABI commercializes first automated DNA sequencer Affy launches Gene Expression microarrays First microarray publication - on Arabidopsis 1986, ABI created the first automated DNA sequencer Hapmap project launched ILMN launches gene expression arrays Affy & ILMN both launched 100K genotyping arrays ILMN bought Solexa; launches GA Roche GS FLX launched The Sequencing Shake up!! ILMN HiSeq 2000 launched In the coming future Low hanging fruit: cystic fibrosis mutation identified Rise of Genbank databases from DNA sequencing 3700 DNA Analyzer in Human Genome Project; DNA sequencing goes industrial Human Genome Project & Celera Genomics completes first draft genome Hapmap 1 st phase data release ABI SOLiD 1.0 Launched! Rise of Genome Wide Association Studies (GWAS) SOLiD 3.0: 100GB out of the box! The 3 rd Generation Sequencing will be launched

8 Milestone of Genomics Technology ABI commercializes first automated DNA sequencer Affy launches Gene Expression microarrays First microarray publication - on Arabidopsis Hapmap project launched Affy & ILMN both launched 100K genotyping arrays In 1994, NCBI created the national DNA ILMN launches gene expression database called Genbank arrays ILMN bought Solexa; launches GA Roche GS FLX launched The Sequencing Shake up!! ILMN HiSeq 2000 launched In the coming future Low hanging fruit: cystic fibrosis mutation identified Rise of Genbank databases from DNA sequencing 3700 DNA Analyzer in Human Genome Project; DNA sequencing goes industrial Human Genome Project & Celera Genomics completes first draft genome Hapmap 1 st phase data release ABI SOLiD 1.0 Launched! Rise of Genome Wide Association Studies (GWAS) SOLiD 3.0: 100GB out of the box! The 3 rd Generation Sequencing will be launched

9 Milestone of Genomics Technology ABI commercializes first automated DNA sequencer Affy launches Gene Expression microarrays First microarray publication - on Arabidopsis Hapmap project launched Affy & ILMN both launched 100K genotyping arrays In 1998, ABI 3700 DNA sequencer was launched tothe market. HGP became a large scale. The intense competition between the ILMN launches science community and the industry gene expression arrays accelerated the HGP greatly. ILMN bought Solexa; launches GA Roche GS FLX launched The Sequencing Shake up!! ILMN HiSeq 2000 launched In the coming future Low hanging fruit: cystic fibrosis mutation identified Rise of Genbank databases from DNA sequencing 3700 DNA Analyzer in Human Genome Project; DNA sequencing goes industrial Human Genome Project & Celera Genomics completes first draft genome Hapmap 1 st phase data release ABI SOLiD 1.0 Launched! Rise of Genome Wide Association Studies (GWAS) SOLiD 3.0: 100GB out of the box! The 3 rd Generation Sequencing will be launched

10 Milestone of Genomics Technology ABI commercializes first automated DNA sequencer Affy launches Gene Expression microarrays HGP and First microarray a private company publication - on Arabidopsis Celera Genomics published the draft of HG simultaneously. China finished 1% of the HG. Hapmap project launched ILMN launches gene expression arrays Affy & ILMN both launched 100K genotyping arrays ILMN bought Solexa; launches GA Roche GS FLX launched The Sequencing Shake up!! ILMN HiSeq 2000 launched In the coming future Low hanging fruit: cystic fibrosis mutation identified Rise of Genbank databases from DNA sequencing 3700 DNA Analyzer in Human Genome Project; DNA sequencing goes industrial Human Genome Project & Celera Genomics completes first draft genome Hapmap 1 st phase data release ABI SOLiD 1.0 Launched! Rise of Genome Wide Association Studies (GWAS) SOLiD 3.0: 100GB out of the box! The 3 rd Generation Sequencing will be launched

11 Milestone of Genomics Technology ABI commercializes first automated DNA sequencer Affy launches Gene Expression microarrays First microarray In 2002, Human publication - on haplotype Arabidopsis project started. It finished in China finished 10%. Hapmap project launched ILMN launches gene expression arrays Affy & ILMN both launched 100K genotyping arrays ILMN bought Solexa; launches GA Roche GS FLX launched The Sequencing Shake up!! ILMN HiSeq 2000 launched In the coming future Low hanging fruit: cystic fibrosis mutation identified Rise of Genbank databases from DNA sequencing 3700 DNA Analyzer in Human Genome Project; DNA sequencing goes industrial Human Genome Project & Celera Genomics completes first draft genome Hapmap 1 st phase data release ABI SOLiD 1.0 Launched! Rise of Genome Wide Association Studies (GWAS) SOLiD 3.0: 100GB out of the box! The 3 rd Generation Sequencing will be launched

12 Milestone of Genomics Technology ABI commercializes first automated DNA sequencer Affy launches Gene Expression microarrays First microarray In 2006, Illumina publication - on Arabidopsis launched GA, the first NGS sequencer. Hapmap project launched ILMN launches gene expression arrays Affy & ILMN both launched 100K genotyping arrays ILMN bought Solexa; launches GA Roche GS FLX launched The Sequencing Shake up!! ILMN HiSeq 2000 launched In the coming future Low hanging fruit: cystic fibrosis mutation identified Rise of Genbank databases from DNA sequencing 3700 DNA Analyzer in Human Genome Project; DNA sequencing goes industrial Human Genome Project & Celera Genomics completes first draft genome Hapmap 1 st phase data release ABI SOLiD 1.0 Launched! Rise of Genome Wide Association Studies (GWAS) SOLiD 3.0: 100GB out of the box! The 3 rd Generation Sequencing will be launched

13 Milestone of Genomics Technology ABI commercializes first automated DNA sequencer Affy launches Gene Expression microarrays In First 2007, microarray Roche GS publication - on Arabidopsis FLX(454) and ABI Solid 1.0 were launched to market. Hapmap project launched ILMN launches gene expression arrays Affy & ILMN both launched 100K genotyping arrays ILMN bought Solexa; launches GA Roche GS FLX launched The Sequencing Shake up!! ILMN HiSeq 2000 launched In the coming future Low hanging fruit: cystic fibrosis mutation identified Rise of Genbank databases from DNA sequencing 3700 DNA Analyzer in Human Genome Project; DNA sequencing goes industrial Human Genome Project & Celera Genomics completes first draft genome Hapmap 1 st phase data release ABI SOLiD 1.0 Launched! Rise of Genome Wide Association Studies (GWAS) SOLiD 3.0: 100GB out of the box! The 3 rd Generation Sequencing will be launched

14 Milestone of Genomics Technology ABI commercializes first automated DNA sequencer Affy launches Gene Expression microarrays Hapmap project launched Affy & ILMN both launched 100K genotyping arrays The First microarray 3 rd generation of publication - on Arabidopsis sequencer may be launched to the ILMN market launches in gene expression 2014? arrays ILMN bought Solexa; launches GA Roche GS FLX launched The Sequencing Shake up!! ILMN HiSeq 2000 launched In the coming future Low hanging fruit: cystic fibrosis mutation identified Rise of Genbank databases from DNA sequencing 3700 DNA Analyzer in Human Genome Project; DNA sequencing goes industrial Human Genome Project & Celera Genomics completes first draft genome Hapmap 1 st phase data release ABI SOLiD 1.0 Launched! Rise of Genome Wide Association Studies (GWAS) SOLiD 3.0: 100GB out of the box! The 3 rd Generation Sequencing will be launched

15 Cost of per base and per human genome dramatically dropped Innovation of NGS throughput Cost of per Human Genome Throughput (Gb) Gb-300Gb $M 100, , years ~$3,000,000, , Moore s Law <2 weeks ~$1, Gb Gb 6Gb he deep sequencing technology become more and more popular in translational medicine research because of its lower price and high-throughput

16 The revolution caused by Next- Generation Sequencing (NGS) < <? NGS: 2 nd generation One staff, one machine 3 rd generation: Oxford Nanopore , Sanger sequencing requires industrialized lab and many staffs NATURE METHODS,16 VOL.5 NO.1 JANUARY

17 NGS platforms 3 rd generation: Oxford Nanopore MinION Applied Biosystems ABI 3730XL Roche / 454 Genome Sequencer FLX HeliScope Single Molecule Sequencer Illumina / Solexa Genetic Analyzer Applied Biosystems SOLiD 17

18 Comparison of NGS and traditional sequencing platform Platform Sanger 454 Solexa SOLiD Read length (bp) # of reads/run ,000 2,000,000, ,000,000,0 00 Error rate 10^-3 <10^-2 ~10^-2 ~10^-2 Cost($/Mbp) 5000 ~5 ~0.6 ~0.2 Time/run ~3 h ~7 hours 2-14 days 3-14 days Throughput 100Kb ~1Gb ~600Gb Gb

19 Metagenomics ( 元 / 基因组学 )

20 Microbes in a natural environment in community Complex system Structure and function diversity Organism composition Gene composition Function composition <1% can be isolated and cultured 20

21 Metagenome and Metagenomics Metagenome All genetic materials in an environmental sample Metagenomics Direct sequence and analyze metagenomic sequences Can study the composition structure of microbial communities in natural environments Thousands of new organisms have been found Changing our view about the world of life Provide new viewpoints and methods for environment, energy, and health related research areas 21

22 Typical Sources of Metagenomes Soil samples Sea water samples Seabed samples Air samples Medical samples Ancient bones Human microbiome

23 Human contains not only the human genome In a healthy adult Microbial cells are 10 times more than human cells Most of them are in gut 23

24 Three basic questions in Metagenomics Who is in there? Metagenome binning What are they doing? Function annotation How do they compare? Comparative metagenomics

25 Metagenome sequence analysis Sequencing data quality control Sequencing simulation Finding community structure Metagenome sequence classification Finding the functional composition structure How do they compare

26 Raw metagenome data Input: Sponge metagenome sequence dataset 164,421 reads (23.2M) Sequence length: Input file in FASTA format (our pipeline accepts FASTQ file too)

27 Sequencing quality control Before quality control: Low quality on both ends of reads (top) A large number of reads have a hyper low average quality score (bottom) FastQC: fast quality control for sequence data

28 Sequence quality control After quality control: Low quality bases eliminated (top) Reads with a average quality score less than 20 have been removed (bottom) FastQC: fast quality control for sequence data

29 Metagenome sequence analysis Sequencing data quality control Metagenome sequencing simulation Finding community structure Metagenome sequence classification Finding the functional composition structure How do they compare

30 NGS sequencing bias Figure 2.The distribution of quality values at each base. X axis: the coordinates of reads (0-based); Y axis: the PHRED scores. The blue dots represent the average quality values and the red dots represent the median. (A) Illumina; and (B) 454 sequencing platform. Jia B, Xuan L, Cai K, Hu Z, Ma L, et al. (2013) NeSSM: A Next-Generation Sequencing Simulator for Metagenomics. PLoS ONE 8(10): e75448.

31 Metagenome sequencing simulation The comparison of sequencing coverage before and after simulation. X axis: the coordinate of the genome of Acinetobacter baumannii ATCC Each interval contains 100 bases and only the first 3,000 intervals are shown; Y axis: the read numbers mapped in each interval. A: the sequencing coverage in the Dataset F; B: NeSSM; C: MetaSim ; D: GemSIM ; E: Grinder; and F: pirs. Jia B, Xuan L, Cai K, Hu Z, Ma L, et al. (2013) NeSSM: A Next-Generation Sequencing Simulator for Metagenomics. PLoS ONE 8(10): e75448.

32 The distributions of read lengths. X axis: the lengths of reads. Each interval is 10 bps; Y axis: the number of reads with lengths in a certain interval. Jia B, Xuan L, Cai K, Hu Z, Ma L, et al. (2013) NeSSM: A Next-Generation Sequencing Simulator for Metagenomics. PLoS ONE 8(10): e75448.

33 Software Platform Read number Read length Time(s) NeSSM_CPU Illumina 90 million NeSSM_GPU Illumina 90 million MetaSim Illumina 90 million 36 3,821 GemSIM* Illumina 90 million 36 90,600* Grinder* Illumina 90 million 36 2,143,078* NeSSM_CPU million 250 5,560 NeSSM_GPU million MetaSim million ,968 GemSIM* million ,359* Grinder* million 250 2,236,412* Comparison of the speed of NeSSM and existing tools on HC metagenome simulation.*: predicted by a linear extension of the times for a series of small datasets. Jia B, Xuan L, Cai K, Hu Z, Ma L, et al. (2013) NeSSM: A Next-Generation Sequencing Simulator for Metagenomics. PLoS ONE 8(10): e75448.

34 Metagenome sequence analysis Sequencing data quality control Metagenome sequencing simulation Finding community structure Metagenome sequence classification How do they compare

35 Q1:WHO IS IN THERE?

36 Two types of metagenomics methods 1. Marker gene sequencing: 16S rrna, 18S rrna, 2. Whole genome shotgun sequencing

37 Variant regions of 16S rrna gene We find that different taxonomic assignment methods vary radically in their ability to recapture the taxonomic information in full-length 16S rrna sequences: most methods are sensitive to the region of the 16S rrna gene that is targeted for sequencing, but many combinations of methods and rrna regions produce consistent and accurate results. To process large datasets of partial 16S rrna sequences obtained from surveys of various microbial communities, including those from human body habitats, we recommend the use of Greengenes or RDP classifier with fragments of at least 250 bases, starting from one of the primers R357, R534, R798, F343 or F517. Accurate taxonomy assignments from 16S rrna sequences produced by highly parallel pyrosequencers. Liu Z, DeSantis TZ, Andersen GL, Knight R. Nucleic Acids Res Oct;36(18):e120. Epub 2008 Aug 22.

38 Limitations of 16S classification The copy number of 16S rrna gene can vary by an order of magnitude between bacterial species PCR-induced biases.

39 Environmental Shotgun Sequencing Sampling from habitat; Filtering particles, typically by size; DNA extraction and lysis; Cloning and library; Sequence the clones; Computational analysis A primer on metagenomics. Wooley JC, Godzik A, Friedberg I. PLoS Comput Biol Feb 26;6(2):e Review.

40 Metagenome sequence analysis Sequencing data quality control Finding community structure Metagenome sequence classification Finding the functional composition structure How do they compare

41 Background Metagenomic sequence classification Assign sequences to groups representing the same or similar taxa A prerequisite for genome assembly and biological diversity finding for an environment Two types of methods 1. Alignment based 2. Sequence composition feature based

42 Methods based on alignment ~5% <1%

43 Methods based on alignment Accuracy not accurate for those w/o similar genomes available Computation complexity? BLAST: too slow Bowtie: fast, but can t align those with indels or many mismatches

44 Sequence composition based methods Advantages: Alignment free Can deal with sequences from unknown genomes Existing methods Phymm (Brady A, Salzberg SL, 2009, 2011, Nat Methods) Computation complexity? Fast, but still takes weeks for Phymm to analyze one run of 454 sequencing

45 Graphics Processing Unit (GPU) GPU devotes more transistors to computing Many cores(a few hundred to a few thousand cores) Control Cache ALU ALU ALU ALU DRAM CPU Nvdia Tesla 2050 DRAM 448 cores, 3GB memory GPU

46 GPU computing power Floating-Point Operations per Second of the CPU and GPU

47 GPUs for Bioinformatics Applications 3D display acceleration Parallel computing (especially for little I/O, and big computing) Bioinformatics applications (speedup) Molecular structure computing, molecular dynamic simulation (~200) Sequence alignment(hammer, ~100) Database search (?) High-throughput sequencing data analysis (~20) Metagenomic sequence classification(~20)

48 Goal: classify short sequences Databases kmms for genomes NCBI Taxonomy table Input: short sequences Output: taxonomy of each sequence Available at: b/software/metabing/metabing.php MetaBinG: Using GPUs to accelerate metagenomic sequence classification

49 MetaBinG: Method P( X Kmer ) i j k F ( Kmer, X ) i k j F ( Kmer ) i k

50 MetaBinG: Method l k 1 S ln( p ( X Kmer )) i i k j j 0

51 Training and test datasets 1212 bacterial genomes 390 genomes were removed 468 for training, 354 for test Simulation dataset 6640 test reads for each of the ten different lengths Compare with Phymm *A. Brady and S. L. Salzberg. Phymm and PhymmBL. Nature Methods Vol. 6, No. 9, pp (September 2009)

52 Result comparison Phymm * MetaBinG Sequence Length Accuracy Time Accuracy Time (bps) (%) (s) (%) (s) Speedup Comparison of Phymm and MetaBinG with accuracy at phylum level Jia, P., Xuan, L., Liu, L., Wei, C.*, 2011 PLoS ONE, 6(11): e25353 * Brady A, Salzberg SL, 2009, Nat Methods, 6: * Brady A, Salzberg S 2011, Nat Methods, 8: 367.

53 MetaBinG for a real dataset Biogas reactor dataset (Schluter, et al., 2008) 616, reads Average length 230 bps All 1212 genomes are used for classification Computing time: MetaBinG: 248 seconds Phymm*: 4days 5 hours and 56 seconds Speedup: ~=1500 Jia, P., Xuan, L., Liu, L., Wei, C.*, 2011 PLoS ONE, 6(11): e25353 * Brady A, Salzberg SL, 2009, Nat Methods, 6: * Brady A, Salzberg SL, 2011, Nat Methods, 8: 367.

54 MetaBinG for a real dataset 14 of the top 15 phyla generated by Phymm was in the list of top 15 produced by MetaBinG. The relative ranks for these phyla varies at most by a value of two.

55 Contribution of GPUs MetaBinG versions Single threaded CPU version Parallel CPU version** GPU version Speed up 1 ~24 ~600 The contribution of GPUs: ~25 times speedup* The tests were on the biogas dataset. The real values may various. Parallel CPU version was implemented with BLAS library from MKL Intel.

56 MetaBinG for 60GB sequencing data Dataset: 0.48 billion of Solexa Illumina reads 100 bps in average Total size: 60GB Computing time: Phymm: 5 CPU years (estimated) MetaBinG: ~30 hours

57 Summary MetaBinG s accuracy is close to Phymm; MetaBinG is at least 2 orders faster than Phymm GPUs can speed up ~25 times for metagenome sequence classification GPUs can be applied to more bioinformatics areas Sequence alignment

58 Q2:WHAT ARE THEY DOING?

59 Metagenome sequence analysis Sequencing data quality control Metagenome sequencing simulation Finding community structure Metagenome sequence classification Finding the functional composition structure How do they compare

60 Gene calling Homology search, BLAST Sensitivity is low Specificity is high Ab initio gene prediction, Markov models.

61 Gene prediction methods used in metagenomic projects

62 Gene prediction on unassembled single reads and assembled contigs

63 Blast Megan MG-RAST OR Skip the gene calling step.

64

65 Method ipath Compare pathways of different parts Only A, here is Bacteria Only B, here is Pathogen A and B both (Since we use a small random dataset here, only aligned bacteria sequences are enough for downstream analysis, e.g. KEGG ortholog searching, while pathogen DB includes all bacteria data too. So here A & B actually are the same thing.)

66 Method - ipath Output ipath input file (by ipath.pl) KEGG Pathway ID Color Width = 2*log(amount) Opacity between 0-1

67 Bacteria: pathway Results KEGG pathway

68 Interactions between a host and its environment microbiota Red is for host only, green is for environment microbiota only, and blue is for both. The thickness of the lines is for the relative abundance.

69 Metagenome sequence analysis Sequencing data quality control Metagenome sequencing simulation Finding community structure Metagenome sequence classification Finding the functional composition structure How do they compare

70 Q3:How Do They Compare?

71 MEGAN MG-RAST JGI IMG CAMERA Tools for Metagenomics

72 Metagenomics and pathogen genomics Metagenomic data collection and analysis system Meta-All: Using all bacterial genomes available Community composition structure analysis Metagenome sequence classification Metagenome sequence assembly Functional element finding in Metagenome Unknown pathogen identification Metagenomics-based pathogen identification 72

73 16S-rRNAs in bacterial genomes 73

74 Meta-All vs. Megan* *Mitra S, Klar B, Huson DH (2009), Bioinformatics 25:

75 Metagenomics for oral microbiota

76 Materials and Methods Samples 60 children with ages from 3 to 6 34 boys, 26 girls MN (boy, no caries, n=17), MC (boy, with caries, n=17) FN (girl, no caries, n=11) FC (girl, with caries, n=15) Sample collection and preparation Saliva and plaque Sequencing 186,787 reads of V3 regions of the 16S rrna gene

77 The diversity of oral microbiota Ling et al., Microbial Ecology, 2010, 60(3): Streptococcus Prevotella Neisseria Pasteurella Veillonella Porphyromonas Leptotrichia Rothia Capnocytophaga Granulicatella Fusobacterium Thiomonas Gemella Actinomyces Hallella Corynebacterium Kingella Anaerosporobacter Campylobacter Planobacterium Oribacterium Tannerella Peptostreptococcus Cardiobacterium TM7_genera_incertae_sedis Atopobium SR1_genera_incertae_sedis Chryseobacterium Catonella Abiotrophia Solobacterium Dysgonomonas Eubacterium Actinobaculum Lactobacillus Tessaracoccus Moraxella Coprococcus Butyricimonas Demetria Johnsonella Parvimonas Butyrivibrio

78 The relative abundance of bacterial V3 tags (at Phylum level) F:Female, M:Male, C:with caries, N: no caries S:Saliva, P: plaque Ling et al., Microbial Ecology, 2010, 60(3):677-90

79 Clustering of the 8 oral bacterial communities Saliva and dental plaque habored distinct bacterial communities.

80 Heatmap of 72 predominant bacterial genera

81 6 genera are associated with dental caries significantly (p < 0.05) Ling et al., Microbial Ecology, 2010, 60(3):

82 Acknowledgement Zhiqiang Hu Kaiye Cai Peng Jia Ben Jia Prof. Liping Zhao, Shanghai Jiao Tong University Prof. Chunsheng Xiang, Zhejiang University Zongxin Lin, Zhejiang University

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11)

Chapter 7. Motif finding (week 11) Chapter 8. Sequence binning (week 11) Course organization Introduction ( Week 1) Part I: Algorithms for Sequence Analysis (Week 1-11) Chapter 1-3, Models and theories» Probability theory and Statistics (Week 2)» Algorithm complexity analysis

More information

Bioinformatics for Microbial Biology

Bioinformatics for Microbial Biology Bioinformatics for Microbial Biology Chaochun Wei ( 韦朝春 ) ccwei@sjtu.edu.cn http://cbb.sjtu.edu.cn/~ccwei Fall 2013 1 Outline Part I: Visualization tools for microbial genomes Tools: Gbrowser Part II:

More information

Experimental Design Microbial Sequencing

Experimental Design Microbial Sequencing Experimental Design Microbial Sequencing Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu General rules for preparing

More information

CBC Data Therapy. Metagenomics Discussion

CBC Data Therapy. Metagenomics Discussion CBC Data Therapy Metagenomics Discussion General Workflow Microbial sample Generate Metaomic data Process data (QC, etc.) Analysis Marker Genes Extract DNA Amplify with targeted primers Filter errors,

More information

Matthew Tinning Australian Genome Research Facility. July 2012

Matthew Tinning Australian Genome Research Facility. July 2012 Next-Generation Sequencing: an overview of technologies and applications Matthew Tinning Australian Genome Research Facility July 2012 History of Sequencing Where have we been? 1869 Discovery of DNA 1909

More information

Metagenomics Computational Genomics

Metagenomics Computational Genomics Metagenomics 02-710 Computational Genomics Metagenomics Investigation of the microbes that inhabit oceans, soils, and the human body, etc. with sequencing technologies Cooperative interactions between

More information

Using New ThiNGS on Small Things. Shane Byrne

Using New ThiNGS on Small Things. Shane Byrne Using New ThiNGS on Small Things Shane Byrne Next Generation Sequencing New Things Small Things NGS Next Generation Sequencing = 2 nd generation of sequencing 454 GS FLX, SOLiD, GAIIx, HiSeq, MiSeq, Ion

More information

Next Generation Sequencing. Tobias Österlund

Next Generation Sequencing. Tobias Österlund Next Generation Sequencing Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45

More information

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing Plant and animal whole genome re-sequencing (WGRS) involves sequencing the entire genome of a plant or animal and comparing the sequence

More information

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015 Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck

More information

Mapping Microbiomes at the Micron Scale

Mapping Microbiomes at the Micron Scale Mapping Microbiomes at the Micron Scale Whitehead Institute Gary Borisy Partnership for Science Education June 05, 2017 Microbes live in communities Microbiome the ecological community of commensal, symbiotic,

More information

TECHNIQUES FOR STUDYING METAGENOME DATASETS METAGENOMES TO SYSTEMS.

TECHNIQUES FOR STUDYING METAGENOME DATASETS METAGENOMES TO SYSTEMS. TECHNIQUES FOR STUDYING METAGENOME DATASETS METAGENOMES TO SYSTEMS. Ian Jeffery I.Jeffery@ucc.ie What is metagenomics Metagenomics is the study of genetic material recovered directly from environmental

More information

Molecular methods to characterize the microbiota in the mouse tissues

Molecular methods to characterize the microbiota in the mouse tissues Molecular methods to characterize the microbiota in the mouse tissues Olivier Bouchez, GeT-PlaGe, INRA Toulouse @GeT_Genotoul Who are we? Genomic and transcriptomic core facility spreads on 5 sites GeT

More information

Practical Bioinformatics for Life Scientists. Week 14, Lecture 27. István Albert Bioinformatics Consulting Center Penn State

Practical Bioinformatics for Life Scientists. Week 14, Lecture 27. István Albert Bioinformatics Consulting Center Penn State Practical Bioinformatics for Life Scientists Week 14, Lecture 27 István Albert Bioinformatics Consulting Center Penn State No homework this week Project to be given out next Thursday (Dec 1 st ) Due following

More information

Contact us for more information and a quotation

Contact us for more information and a quotation GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA

More information

Human genome sequence

Human genome sequence NGS: the basics Human genome sequence June 26th 2000: official announcement of the completion of the draft of the human genome sequence (truly finished in 2004) Francis Collins Craig Venter HGP: 3 billion

More information

CBC Data Therapy. Metatranscriptomics Discussion

CBC Data Therapy. Metatranscriptomics Discussion CBC Data Therapy Metatranscriptomics Discussion Metatranscriptomics Extract RNA, subtract rrna Sequence cdna QC Gene expression, function Institute for Systems Genomics: Computational Biology Core bioinformatics.uconn.edu

More information

Applications of Next Generation Sequencing in Metagenomics Studies

Applications of Next Generation Sequencing in Metagenomics Studies Applications of Next Generation Sequencing in Metagenomics Studies Francesca Rizzo, PhD Genomix4life Laboratory of Molecular Medicine and Genomics Department of Medicine and Surgery University of Salerno

More information

choose MBL-REGISTER user: dm00834 password: dm00834 http://register.mbl.edu/ stamps.mbl.edu this uses the username and password on your STAMPS name badge Strategies for Analysis of Microbial Population

More information

Next Gen Sequencing. Expansion of sequencing technology. Contents

Next Gen Sequencing. Expansion of sequencing technology. Contents Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND

More information

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017 Next Generation Sequencing Jeroen Van Houdt - Leuven 13/10/2017 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977 A Maxam and W Gilbert "DNA seq by chemical degradation" F Sanger"DNA

More information

Next G eneration Generation Microbial Microbial Genomics : The H uman Human Microbiome P roject Project George Weinstock

Next G eneration Generation Microbial Microbial Genomics : The H uman Human Microbiome P roject Project George Weinstock Next Generation Microbial Genomics: The Human Microbiome Project George Weinstock San Rocco: Protector from Infectious Diseases Large genome centers All have metagenomics programs Baylor College of Medicine

More information

Third Generation Sequencing

Third Generation Sequencing Third Generation Sequencing By Mohammad Hasan Samiee Aref Medical Genetics Laboratory of Dr. Zeinali History of DNA sequencing 1953 : Discovery of DNA structure by Watson and Crick 1973 : First sequence

More information

Microbiomes and metabolomes

Microbiomes and metabolomes Microbiomes and metabolomes Michael Inouye Baker Heart and Diabetes Institute Univ of Melbourne / Monash Univ Summer Institute in Statistical Genetics 2017 Integrative Genomics Module Seattle @minouye271

More information

Next generation sequencing techniques" Toma Tebaldi Centre for Integrative Biology University of Trento

Next generation sequencing techniques Toma Tebaldi Centre for Integrative Biology University of Trento Next generation sequencing techniques" Toma Tebaldi Centre for Integrative Biology University of Trento Mattarello September 28, 2009 Sequencing Fundamental task in modern biology read the information

More information

Strain/species identification in metagenomes using genome-specific markers. Tu, He and Zhou Nucleic Acids Research

Strain/species identification in metagenomes using genome-specific markers. Tu, He and Zhou Nucleic Acids Research Strain/species identification in metagenomes using genome-specific markers. Tu, He and Zhou. 2014 Nucleic Acids Research Journal Club Triinu Kõressaar 25.04.2014 Introduction (1/2) Shotgun metagenome sequencing

More information

Understanding the science and technology of whole genome sequencing

Understanding the science and technology of whole genome sequencing Understanding the science and technology of whole genome sequencing Dag Undlien Department of Medical Genetics Oslo University Hospital University of Oslo and The Norwegian Sequencing Centre d.e.undlien@medisin.uio.no

More information

Overview of Next Generation Sequencing technologies. Céline Keime

Overview of Next Generation Sequencing technologies. Céline Keime Overview of Next Generation Sequencing technologies Céline Keime keime@igbmc.fr Next Generation Sequencing < Second generation sequencing < General principle < Sequencing by synthesis - Illumina < Sequencing

More information

Accelerate High Throughput Analysis for Genome Sequencing with GPU

Accelerate High Throughput Analysis for Genome Sequencing with GPU Accelerate High Throughput Analysis for Genome Sequencing with GPU ATIP - A*CRC Workshop on Accelerator Technologies in High Performance Computing May 7-10, 2012 Singapore BingQiang WANG, Head of Scalable

More information

GPU-Meta-Storms: Computing the similarities among massive microbial communities using GPU

GPU-Meta-Storms: Computing the similarities among massive microbial communities using GPU GPU-Meta-Storms: Computing the similarities among massive microbial communities using GPU Xiaoquan Su $, Xuetao Wang $, JianXu, Kang Ning* Shandong Key Laboratory of Energy Genetics, CAS Key Laboratory

More information

Bioinformatics and computational tools

Bioinformatics and computational tools Bioinformatics and computational tools Etienne P. de Villiers (PhD) International Livestock Research Institute Nairobi, Kenya International Livestock Research Institute Nairobi, Kenya ILRI works at the

More information

NeSSM: A Next-Generation Sequencing Simulator for Metagenomics

NeSSM: A Next-Generation Sequencing Simulator for Metagenomics NeSSM: A Next-Generation Sequencing Simulator for Metagenomics Ben Jia 1., Liming Xuan 3,4., Kaiye Cai 4., Zhiqiang Hu 2,4, Liangxiao Ma 4, Chaochun Wei 2,4 * 1 School of Biomedical Engineering, Shanghai

More information

Functional annotation of metagenomes

Functional annotation of metagenomes Functional annotation of metagenomes Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Introduction Functional analysis Objectives:

More information

NGS part 2: applications. Tobias Österlund

NGS part 2: applications. Tobias Österlund NGS part 2: applications Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45

More information

Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data

Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data What can we tell about the taxonomic and functional stability of microbiota? Why? Nature. 2012; 486(7402): 207 214. doi:10.1038/nature11234

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature09944 Supplementary Figure 1. Establishing DNA sequence similarity thresholds for phylum and genus levels Sequence similarity distributions of pairwise alignments of 40 universal single

More information

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014

Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014 Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME Peter Sterk EBI Metagenomics Course 2014 1 Taxonomic analysis using next-generation sequencing Objective we want to

More information

Functional profiling of metagenomic short reads: How complex are complex microbial communities?

Functional profiling of metagenomic short reads: How complex are complex microbial communities? Functional profiling of metagenomic short reads: How complex are complex microbial communities? Rohita Sinha Senior Scientist (Bioinformatics), Viracor-Eurofins, Lee s summit, MO Understanding reality,

More information

DNA Sequencing and Assembly

DNA Sequencing and Assembly DNA Sequencing and Assembly CS 262 Lecture Notes, Winter 2016 February 2nd, 2016 Scribe: Mark Berger Abstract In this lecture, we survey a variety of different sequencing technologies, including their

More information

GREG GIBSON SPENCER V. MUSE

GREG GIBSON SPENCER V. MUSE A Primer of Genome Science ience THIRD EDITION TAGCACCTAGAATCATGGAGAGATAATTCGGTGAGAATTAAATGGAGAGTTGCATAGAGAACTGCGAACTG GREG GIBSON SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc.

More information

De Novo Assembly of High-throughput Short Read Sequences

De Novo Assembly of High-throughput Short Read Sequences De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,

More information

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop Output (bp) Aaron Liston, Oregon State University Growth in Next-Gen Sequencing Capacity 3.5E+11 2002 2004 2006 2008 2010 3.0E+11 2.5E+11 2.0E+11 1.5E+11 1.0E+11 Adapted from Mardis, 2011, Nature 5.0E+10

More information

Introduction to NGS. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Introduction to NGS. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis Introduction to NGS Josef K Vogt Slides by: Simon Rasmussen 2017 Life science data deluge Massive unstructured data from several areas DNA, patient journals, proteomics, imaging,... Impacts Industry, Environment,

More information

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun The Journey of DNA Sequencing H. Sunny Sun What is a genome? Genome is the total genetic complement of a living organism. The nuclear genome comprises approximately 3.2 * 10 9 nucleotides of DNA, divided

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park Next Generation Sequences & Chloroplast Assembly 8 June, 2012 Jongsun Park Table of Contents 1 History of Sequencing Technologies 2 Genome Assembly Processes With NGS Sequences 3 How to Assembly Chloroplast

More information

Lecture 8: Predicting metagenomic composition from 16S survey data

Lecture 8: Predicting metagenomic composition from 16S survey data Lecture 8: Predicting metagenomic composition from 16S survey data Taxonomic and functional stability of microbiota Nature. 2012; 486(7402): 207 214. doi:10.1038/nature11234 2 1 7/6/16 A model of functional

More information

Microbiome: Metagenomics 4/4/2018

Microbiome: Metagenomics 4/4/2018 Microbiome: Metagenomics 4/4/2018 metagenomics is an extension of many things you have already learned! Genomics used to be computationally difficult, and now that s metagenomics! Still developing tools/algorithms

More information

Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the

Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the Supplementary Information Supplementary Figures Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the strain M8 of S. ruber and a fosmid containing the S. ruber M8 virus M8CR4

More information

Introduction to the MiSeq

Introduction to the MiSeq Introduction to the MiSeq 2011 Illumina, Inc. All rights reserved. Illumina, illuminadx, BeadArray, BeadXpress, cbot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate,

More information

Francisco García Quality Control for NGS Raw Data

Francisco García Quality Control for NGS Raw Data Contents Data formats Sequence capture Fasta and fastq formats Sequence quality encoding Quality Control Evaluation of sequence quality Quality control tools Identification of artifacts & filtering Practical

More information

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms Next Generation Sequencing Lecture Saarbrücken, 19. March 2012 Sequencing Platforms Contents Introduction Sequencing Workflow Platforms Roche 454 ABI SOLiD Illumina Genome Anlayzer / HiSeq Problems Quality

More information

Computing for Metagenome Analysis

Computing for Metagenome Analysis New Horizons of Computational Science with Heterogeneous Many-Core Processors Computing for Metagenome Analysis National Institute of Genetics Hiroshi Mori & Ken Kurokawa Contents Metagenome Sequence similarity

More information

Mate-pair library data improves genome assembly

Mate-pair library data improves genome assembly De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate

More information

Introduction to NGS. Simon Rasmussen Associate Professor DTU Bioinformatics Technical University of Denmark 2018

Introduction to NGS. Simon Rasmussen Associate Professor DTU Bioinformatics Technical University of Denmark 2018 Introduction to NGS Simon Rasmussen Associate Professor DTU Bioinformatics Technical University of Denmark 2018 Life science data deluge Massive unstructured data from several areas DNA, patient journals,

More information

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before Jeremy Preston, PhD Marketing Manager, Sequencing Illumina Genome Analyzer: a Paradigm Shift 2000x gain in efficiency

More information

METAGENOMICS. Aina Maria Mas Calafell Genomics

METAGENOMICS. Aina Maria Mas Calafell Genomics METAGENOMICS Aina Maria Mas Calafell Genomics Introduction Microbial communities Primary role in biogeochemical systems Study of microbial communities 1.- Culture-based methodologies Only isolated microbes

More information

The Diploid Genome Sequence of an Individual Human

The Diploid Genome Sequence of an Individual Human The Diploid Genome Sequence of an Individual Human Maido Remm Journal Club 12.02.2008 Outline Background (history, assembling strategies) Who was sequenced in previous projects Genome variations in J.

More information

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis 1 Genetic Analysis Phenotype analysis: biological-biochemical analysis Behaviour under specific environmental conditions Behaviour of specific genetic configurations Behaviour of progeny in crosses - Genotype

More information

Welcome to the NGS webinar series

Welcome to the NGS webinar series Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic

More information

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie Sander van Boheemen Medical Microbiology Next-generation sequencing Next-generation sequencing (NGS), also known as

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Vocabulary Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 Gene: Genetics: Genome: Genomics: hereditary

More information

* Custom assays developed based on customer requirements. * Rigorous QC process to ensure test performance and accuracy

* Custom assays developed based on customer requirements. * Rigorous QC process to ensure test performance and accuracy Price List 2012-13 About Scigenom * Genomics focused R & D organization * Expertise in rapid and accurate DNA test development * Several DNA based tests readily available * Custom assays developed based

More information

Next Generation Sequencing (NGS)

Next Generation Sequencing (NGS) Next Generation Sequencing (NGS) Fernando Alvarez Sección Biomatemática, Facultad de Ciencias, UdelaR 1 Uruguay Montevide o 3 TANGO World Champ 1930 1950 (Maraca 4 Next Generation Sequencing module Next

More information

CSC Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx By Anonymous. Similarity Index

CSC Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx By Anonymous. Similarity Index Page 1 of 6 Document Viewer TurnitinUK Originality Report Processed on: 05-Dec-20 10:49 AM GMT ID: 13 Word Count: 1587 Submitted: 1 CSC8313-201 - Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx

More information

Advanced Technology in Phytoplasma Research

Advanced Technology in Phytoplasma Research Advanced Technology in Phytoplasma Research Sequencing and Phylogenetics Wednesday July 8 Pauline Wang pauline.wang@utoronto.ca Lethal Yellowing Disease Phytoplasma Healthy palm Lethal yellowing of palm

More information

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis Data Basics Josef K Vogt Slides by: Simon Rasmussen 2017 Generalized NGS analysis Sample prep & Sequencing Data size Main data reductive steps SNPs, genes, regions Application Assembly: Compare Raw Pre-

More information

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis 1 Genetic Analysis Phenotype analysis: biological-biochemical analysis Behaviour under specific environmental conditions Behaviour of specific genetic configurations Behaviour of progeny in crosses - Genotype

More information

Sequencing Theory. Brett E. Pickett, Ph.D. J. Craig Venter Institute

Sequencing Theory. Brett E. Pickett, Ph.D. J. Craig Venter Institute Sequencing Theory Brett E. Pickett, Ph.D. J. Craig Venter Institute Applications of Genomics and Bioinformatics to Infectious Diseases GABRIEL Network Agenda Sequencing Instruments Sanger Illumina Ion

More information

Introduction to Microbial Sequencing

Introduction to Microbial Sequencing Introduction to Microbial Sequencing Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis settles@ucdavis.edu; bioinformatics.core@ucdavis.edu General rules for preparing

More information

Informatic Issues in Genomics

Informatic Issues in Genomics Informatic Issues in Genomics DEISE PRACE Symposium Barcelona, 10 12 May 2010 Ivo Glynne Gut, PhD Centro Nacional de Analisis Genomico Barcelona Our Objectives Improve the quality of life Understand the

More information

Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life

Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life METAGENOMICS Carl Woese Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary

More information

Bioinformatic tools for metagenomic data analysis

Bioinformatic tools for metagenomic data analysis Bioinformatic tools for metagenomic data analysis MEGAN - blast-based tool for exploring taxonomic content MG-RAST (SEED, FIG) - rapid annotation of metagenomic data, phylogenetic classification and metabolic

More information

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/

More information

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life METAGENOMICS Carl Woese Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life His amazing discovery, coupled with his solitary behaviour, made many contemporary

More information

Deep Sequencing technologies

Deep Sequencing technologies Deep Sequencing technologies Gabriela Salinas 30 October 2017 Transcriptome and Genome Analysis Laboratory http://www.uni-bc.gwdg.de/index.php?id=709 Microarray and Deep-Sequencing Core Facility University

More information

Next-Generation Sequencing. Technologies

Next-Generation Sequencing. Technologies Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062

More information

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture The use of new sequencing technologies for genome analysis Chris Mattocks National Genetics Reference Laboratory (Wessex) NGRL (Wessex) 2008 Outline General principles of clonal sequencing Analysis principles

More information

Genome Projects. Part III. Assembly and sequencing of human genomes

Genome Projects. Part III. Assembly and sequencing of human genomes Genome Projects Part III Assembly and sequencing of human genomes All current genome sequencing strategies are clone-based. 1. ordered clone sequencing e.g., C. elegans well suited for repetitive sequences

More information

GPU Technology Conference 2012 May 14-17, 2012 San Jose, California BingQiang WANG, Head of Scalable Computing, BGI

GPU Technology Conference 2012 May 14-17, 2012 San Jose, California BingQiang WANG, Head of Scalable Computing, BGI GPU ccelerated Bioinformatics Research at BGI GPU Technology Conference 2012 May 14-17, 2012 San Jose, California BingQiang WNG, Head of Scalable Computing, BGI wangbingqiang@genomics.cn DN double helix

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis -Seq Analysis Quality Control checks Reproducibility Reliability -seq vs Microarray Higher sensitivity and dynamic range Lower technical variation Available for all species Novel transcript identification

More information

Antibiotic Resistance Genes: From The Farm To The Human Gut

Antibiotic Resistance Genes: From The Farm To The Human Gut Shanghai 2015 Antibiotic Resistance Genes: From The Farm To The Human Gut Baoli Zhu, PhD Institute of Microbiology, Chinese Academy Beijing Key Lab of Microbial Drug Resistance and Resistome zhubaoli@im.ac.cn

More information

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio

More information

The Genomic Transformation of Health

The Genomic Transformation of Health The Genomic Transformation of Health or: an introduction to the potential of genomics in healthcare Dr Tom Connor School of Biosciences and Systems Immunity University Research Institute Cardiff University

More information

Ultrasequencing: methods and applications of the new generation sequencing platforms

Ultrasequencing: methods and applications of the new generation sequencing platforms Ultrasequencing: methods and applications of the new generation sequencing platforms Nuria Tubío Santamaría Course: Genomics Universitat Autònoma de Barcelona 1 Introduction Clasical methods of sequencing:

More information

Applied bioinformatics in genomics

Applied bioinformatics in genomics Applied bioinformatics in genomics Productive bioinformatics in a genome sequencing center Heiko Liesegang Warschau 2005 The omics pyramid: 1. 2. 3. 4. 5. Genome sequencing Genome annotation Transcriptomics

More information

NextGen Sequencing and Target Enrichment

NextGen Sequencing and Target Enrichment NextGen Sequencing and Target Enrichment Laurent FARINELLI 7 September 2010 Agilent 3rd Analytic Forum Basel, Switzerland Outline The illumina HiSEQ 2000 system Applications Target enrichment Outlook 7

More information

Infectious Disease Omics

Infectious Disease Omics Infectious Disease Omics Metagenomics Ernest Diez Benavente LSHTM ernest.diezbenavente@lshtm.ac.uk Course outline What is metagenomics? In situ, culture-free genomic characterization of the taxonomic and

More information

Parts of a standard FastQC report

Parts of a standard FastQC report FastQC FastQC, written by Simon Andrews of Babraham Bioinformatics, is a very popular tool used to provide an overview of basic quality control metrics for raw next generation sequencing data. There are

More information

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies Eric T. Weimer, PhD, D(ABMLI) Assistant Professor, Pathology & Laboratory Medicine, UNC School of Medicine Director, Molecular Immunology Associate Director, Clinical Flow Cytometry, HLA, and Immunology

More information

What is Bioinformatics?

What is Bioinformatics? What is Bioinformatics? Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. - NCBI The ultimate goal of the field is

More information

Next-generation sequencing Technology Overview

Next-generation sequencing Technology Overview Next-generation sequencing Technology Overview UQ Winter School 2018 Christopher Noune, PhD AGRF Melbourne christopher.noune@agrf.org.au What is NGS? Ion Torrent PGM (Thermo-Fisher) MiSeq (Illumina) High-Throughput

More information

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics Genomic Technologies Michael Schatz Feb 1, 2018 Lecture 2: Applied Comparative Genomics Welcome! The primary goal of the course is for students to be grounded in theory and leave the course empowered to

More information

LARGE DATA AND BIOMEDICAL COMPUTATIONAL PIPELINES FOR COMPLEX DISEASES

LARGE DATA AND BIOMEDICAL COMPUTATIONAL PIPELINES FOR COMPLEX DISEASES 1 LARGE DATA AND BIOMEDICAL COMPUTATIONAL PIPELINES FOR COMPLEX DISEASES Ezekiel Adebiyi, PhD Professor and Head, Covenant University Bioinformatics Research and CU NIH H3AbioNet node Covenant University,

More information

Next-generation sequencing and quality control: An introduction 2016

Next-generation sequencing and quality control: An introduction 2016 Next-generation sequencing and quality control: An introduction 2016 s.schmeier@massey.ac.nz http://sschmeier.com/bioinf-workshop/ Overview Typical workflow of a genomics experiment Genome versus transcriptome

More information

Chapter 5. Structural Genomics

Chapter 5. Structural Genomics Chapter 5. Structural Genomics Contents 5. Structural Genomics 5.1. DNA Sequencing Strategies 5.1.1. Map-based Strategies 5.1.2. Whole Genome Shotgun Sequencing 5.2. Genome Annotation 5.2.1. Using Bioinformatic

More information

Next-Gen Sequencing in the High School Classroom. April 11, 2017 BOSLAB Mark Hartman

Next-Gen Sequencing in the High School Classroom. April 11, 2017 BOSLAB Mark Hartman Next-Gen Sequencing in the High School Classroom April 11, 2017 BOSLAB Mark Hartman Outline Introduce the BioSeq program Walk through one of our favorite projects as an example of what we do Technical

More information

High throughput omics and BIOINFORMATICS

High throughput omics and BIOINFORMATICS High throughput omics and BIOINFORMATICS Giuseppe D'Auria Seville, February 2009 Genomes from isolated bacteria $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ se q se uen q c se uen ing q c se uen ing qu c en ing c

More information