Next Genera*on Sequencing So2ware for Data Management, Analysis, and Visualiza*on. Session W14

Size: px
Start display at page:

Download "Next Genera*on Sequencing So2ware for Data Management, Analysis, and Visualiza*on. Session W14"

Transcription

1 Next Genera*on Sequencing So2ware for Data Management, Analysis, and Visualiza*on Session W14 1

2 Tools for Next Genera*on Sequencing Data Analysis Kip Lord Bodi Genomics Core Director Tu2s University Core Facility genomics.med.tu2s.edu 2

3 Tu2s University Core Facility Established 1987 Services: DNA Synthesis and Sequencing Pep*de Synthesis and Protein Sequencing Next- Genera*on Sequencing (2008) Illumina Genome Analyzer IIx (HiSeq 2000 arriving March 2011) Roche 454 GS FLX h^p://genomics.med.tu2s.edu 3

4 NGS: Major Applica*ons at Tu2s Whole Genome Sequencing Up to 12 microbial genomes per lane on a GAIIx Probe gene*c networks (Tn- Seq) Transcriptome Profiling (RNA- Seq) Chroma*n- IP Sequencing (ChIP- Seq) Including pulldown experiments Metagenomics 4

5 TUCF Next- Gen Sequencing Services Consult and Design Experiments Sample Prepara*on Genomic DNA, RNA Others: Will assist Actual Sequencing Single End, Paired End, Bases Data Management Bioinforma*cs Analysis 5

6 6

7 Illumina Genome Analyzer IIx Short Read Technology Bases Paired- End also available 240M Reads / Run 30M Reads / Lane 7

8 Run Sta*s*cs 8

9 GAIIx advances: Reagents alone 9

10 Informa*cs Crisis Many life science researchers may not be able to make use of their data Most so2ware is Unix/Linux based Bow*e, MAQ, Samtools Tophat, Cufflinks MACS, ERANGE Peak Caller Many analyses do not fit these tools Will require custom script or so2ware to be wri^en 10

11 Op*ons for Analysis Dedicated bioinforma*cian Bioinforma*cs core facility Commercial so2ware CLC Genomics Workbench GenomeQuest The ambi*ous postdoc Willing to learn And has the *me 11

12 Back to Custom Analyses: Example Department of Molecular and Microbiology Camilli Laboratory at TUSM Studies two microbial pathogens Vibrio cholerae Streptococcus pneumoniae Interested in gene networks and func*on in bacteria Many genes are not annotated 12

13 13

14 sp_0963 Wi.1 Wi.2 Wi.3 Wi Wi.16 Wsp_

15 15

16 16

17 17

18 18

19 19

20 Scene Missing? 20

21 21

22 Tn- Seq Bioinforma*cs Workflow Align short reads to microbial reference Get read counts for each inser*on site (TA) Compare two *me points to generate fitness score for every inser*on Aggregate fitness scores per gene Use query genes to look for gene*c interac*ons and create a gene network 22

23 23

24 Making Computa*onal Analyses Accessible Web- based, Visual Plalorm Analysis tool No need to download or install so2ware Workflows for Genomics Research NGS Toolset QC, manipula*on, mapping, peak calling, RNA- Seq Can be installed on your own computer (or on a local server) Integrate Perl, Python, Shell scripts h^p://usegalaxy.org 24

25 Tn- Seq Analyses Issue Every *me researcher needs to change one parameter in his workflow, it requires our bioinforma*cs team to re- run the en*re analysis Solu*on Integrate workflow into Galaxy Now the researcher can run it himself! Also, others can start using Tn- Seq by using a shared Tn- Seq workflow 25

26 Tool Integra*on Add TnSeq scripts as custom tools in Galaxy Sharing Can publish tools, histories, workflows to all or selected users 26

27 Tn- Seq Galaxy Workflow Alignment Fitness per Insertion Site Download Results Fitness per Gene 27

28

29 29

30 Basic Workflow for NGS Analyses Sequence Data (FASTQ format) Read mapping (Bow*e, MAQ, etc.) Visualiza*on GenomeView, IGV, CLC Genomics Workbench, UCSC Genome Browser Analysis SNP / Indel Detec*on Peak calling Differen*al Expression 30

31 RNA- Seq with Tophat & Cufflinks Tophat: spliced read mapper for RNA- Seq Take unmapped reads, look for splice junc*ons Output in SAM format SAMtools Call varia*ons and filter by depth and quality Cufflinks Differen*al Expression Analysis Tool Output per Gene, Transcript FPKM Report significant changes in expression 31

32 Cufflinks: Differential Expression Tophat: Alignment 32

33 33

34 34

35 Summary A small NGS core can s*ll assist end users with bioinforma*cs analysis Tools that make analyses accessible to the end user help make this possible Galaxy GenomeQuest CLC Genomics Workbench Genoma*x Etc. 35

36 Acknowledgments Tu2s Core Facility Michael Berne Tom Phimmasen James Schiemer Anna Wong Cho Low Paula Nguyen Christy Le Galaxy Team Anton Nekrutenko James Taylor Greg von Kuster Broad Ins*tute Thomas Abeel Camilli Laboratory Tim van Opijnen David Lazinski Heather Kamp Yelick Laboratory 36 Viktoria Andreeva CTRC Core Facility Albert Tai Jennifer Curcuru Fang Liu