Next Genera*on Sequencing So2ware for Data Management, Analysis, and Visualiza*on. Session W14
|
|
- Rosamund Hawkins
- 5 years ago
- Views:
Transcription
1 Next Genera*on Sequencing So2ware for Data Management, Analysis, and Visualiza*on Session W14 1
2 Tools for Next Genera*on Sequencing Data Analysis Kip Lord Bodi Genomics Core Director Tu2s University Core Facility genomics.med.tu2s.edu 2
3 Tu2s University Core Facility Established 1987 Services: DNA Synthesis and Sequencing Pep*de Synthesis and Protein Sequencing Next- Genera*on Sequencing (2008) Illumina Genome Analyzer IIx (HiSeq 2000 arriving March 2011) Roche 454 GS FLX h^p://genomics.med.tu2s.edu 3
4 NGS: Major Applica*ons at Tu2s Whole Genome Sequencing Up to 12 microbial genomes per lane on a GAIIx Probe gene*c networks (Tn- Seq) Transcriptome Profiling (RNA- Seq) Chroma*n- IP Sequencing (ChIP- Seq) Including pulldown experiments Metagenomics 4
5 TUCF Next- Gen Sequencing Services Consult and Design Experiments Sample Prepara*on Genomic DNA, RNA Others: Will assist Actual Sequencing Single End, Paired End, Bases Data Management Bioinforma*cs Analysis 5
6 6
7 Illumina Genome Analyzer IIx Short Read Technology Bases Paired- End also available 240M Reads / Run 30M Reads / Lane 7
8 Run Sta*s*cs 8
9 GAIIx advances: Reagents alone 9
10 Informa*cs Crisis Many life science researchers may not be able to make use of their data Most so2ware is Unix/Linux based Bow*e, MAQ, Samtools Tophat, Cufflinks MACS, ERANGE Peak Caller Many analyses do not fit these tools Will require custom script or so2ware to be wri^en 10
11 Op*ons for Analysis Dedicated bioinforma*cian Bioinforma*cs core facility Commercial so2ware CLC Genomics Workbench GenomeQuest The ambi*ous postdoc Willing to learn And has the *me 11
12 Back to Custom Analyses: Example Department of Molecular and Microbiology Camilli Laboratory at TUSM Studies two microbial pathogens Vibrio cholerae Streptococcus pneumoniae Interested in gene networks and func*on in bacteria Many genes are not annotated 12
13 13
14 sp_0963 Wi.1 Wi.2 Wi.3 Wi Wi.16 Wsp_
15 15
16 16
17 17
18 18
19 19
20 Scene Missing? 20
21 21
22 Tn- Seq Bioinforma*cs Workflow Align short reads to microbial reference Get read counts for each inser*on site (TA) Compare two *me points to generate fitness score for every inser*on Aggregate fitness scores per gene Use query genes to look for gene*c interac*ons and create a gene network 22
23 23
24 Making Computa*onal Analyses Accessible Web- based, Visual Plalorm Analysis tool No need to download or install so2ware Workflows for Genomics Research NGS Toolset QC, manipula*on, mapping, peak calling, RNA- Seq Can be installed on your own computer (or on a local server) Integrate Perl, Python, Shell scripts h^p://usegalaxy.org 24
25 Tn- Seq Analyses Issue Every *me researcher needs to change one parameter in his workflow, it requires our bioinforma*cs team to re- run the en*re analysis Solu*on Integrate workflow into Galaxy Now the researcher can run it himself! Also, others can start using Tn- Seq by using a shared Tn- Seq workflow 25
26 Tool Integra*on Add TnSeq scripts as custom tools in Galaxy Sharing Can publish tools, histories, workflows to all or selected users 26
27 Tn- Seq Galaxy Workflow Alignment Fitness per Insertion Site Download Results Fitness per Gene 27
28
29 29
30 Basic Workflow for NGS Analyses Sequence Data (FASTQ format) Read mapping (Bow*e, MAQ, etc.) Visualiza*on GenomeView, IGV, CLC Genomics Workbench, UCSC Genome Browser Analysis SNP / Indel Detec*on Peak calling Differen*al Expression 30
31 RNA- Seq with Tophat & Cufflinks Tophat: spliced read mapper for RNA- Seq Take unmapped reads, look for splice junc*ons Output in SAM format SAMtools Call varia*ons and filter by depth and quality Cufflinks Differen*al Expression Analysis Tool Output per Gene, Transcript FPKM Report significant changes in expression 31
32 Cufflinks: Differential Expression Tophat: Alignment 32
33 33
34 34
35 Summary A small NGS core can s*ll assist end users with bioinforma*cs analysis Tools that make analyses accessible to the end user help make this possible Galaxy GenomeQuest CLC Genomics Workbench Genoma*x Etc. 35
36 Acknowledgments Tu2s Core Facility Michael Berne Tom Phimmasen James Schiemer Anna Wong Cho Low Paula Nguyen Christy Le Galaxy Team Anton Nekrutenko James Taylor Greg von Kuster Broad Ins*tute Thomas Abeel Camilli Laboratory Tim van Opijnen David Lazinski Heather Kamp Yelick Laboratory 36 Viktoria Andreeva CTRC Core Facility Albert Tai Jennifer Curcuru Fang Liu