EMP, MG-RAST and DOE KBase

Size: px
Start display at page:

Download "EMP, MG-RAST and DOE KBase"

Transcription

1 EMP, MG-RAST and DOE KBase Folker Meyer, PhD Argonne National Laboratory and University of Chicago

2 Microbial community science Environmental clone libraries ( functional metagenomics ) Sanger sequencing of BAC clones with env. DNA à low throughput but supports in vitro screens è novel proteins (the 1990ies) Amplicon studies (single gene studies, 16s rdna) Sequencing of PCR amplified ribosomal genes à sequence quality limited (à rare biosphere debate) à often can t distinguish between individual strains è Community composition, EMP Earth Microbiome Project Shotgun metagenomics random shotgun DNA sequencing applied directly to environmental samples è Novel proteins, novel genomes, evolution, What are they doing? Who are they? data

3 Why is Metagenomics important? Cultivation barrier We study << 10% of all bacteria Cultivation in laboratory is impossible for 90+% Bacteria are vastly understudied (à Nikos Kyrpides, Head Genome Biology, DOE JGI) è Metagenomics unlocks access to vast majority of life on the planet Genomic data and more recently complete genomes give us a starting point to study non cultivated organisms Global climate is created and maintained by bacteria (à Falkowski et al, Science 2008) Bacteria play an important role in Human health (à NIH HMP project Nature 2012, EU+Chinese MetaHit, Science 2011) Many metagenomics projects will create massive number of data sets

4 One large scale project studying the global microbiome

5 Earth Microbiome Project Common reference framework to drive understanding of microbial communities Techniques are accessible to high school students Individual science projects united by tool chain Allows large scale comparisons Reference: Gilbert, el al 2010, Standards in genomic sciences 3 (3), 229 Source: Greg Caporaso

6 EMP Jack A. Gilbert, Folker Meyer, Rick Stevens, Janet Jansson, Rob Knight, Jonathan A Eisen, Jed Furhman, Jeff Gordon, Norman Pace, James Tiedje, Ruth Ley, Noah Fierer, Dawn Field, Nikos Kyrpides, Frank- Oliver Glockner, Hans-Peter Klenk, K. Eric Wommack, Elizabeth M. Glass, Kathryn Docherty, Rachel Gallery,, George Kowalchuk, Mark Bailey, Dion Antonopoulos, Pavan Balaji, C. Titus Brown, C. Titus Brown, Narayan Desai, Dirk Evers, Wu Feng, Daniel Huson, James Knight, Eugene Kolker, Kostas Konstantindis, Joel Kostka, Rachel Mackelprang, Alice McHardy, Christopher Quince, Jeroen Raes, Alexander Sczyrba, Ashley Shade

7 Large scale analytical and comparison engine

8 MG-RAST metagenomics portal 13,000 users >100,000 data set >~1.5 Terabases per month Meyer et al., BMC Bioinformatics, 2008!

9 Large scale analysis and comparison 1500 HMP samples Computing takes long Preliminary analysis Showing PCoA Normalized euclidian distance Subsystem abundance First insight: Analysis can t be technology and quality agnostic PCA Painted with Instrument type blue 454/ green Illumina!0.01& Sequencing)Technology) 1.00E!02& 5.00E!03&!0.008&!0.006&!0.004& 0.00E+00&!0.002& 0& 0.002& 0.004& 0.006& 0.008& 0.002% 0.004% 0.006% 0.008%!5.00E!03&!1.00E!02&!1.50E!02&!2.00E!02& Sequencing & analytical tools Body%Loca)on% 0.01% 0.005% 0%!0.01% PCA Painted with Body Location!0.008%!0.006%!0.004%!0.002% 0%!0.005%!0.01%!0.015%!0.02%

10 Where are we?

11 One bottleneck removed From: NHGRI web site Computing cost (blue) dominate sequencing (red) - Cost on Amazon EC2 Cloud, September Pure run-time for BLASTX, no storage or data transfer From: Wilkening et al., IEEE Cluster09, 2009

12 Computing is now the major bottleneck

13 DOE KBase platforms enable analysis and comparison Knowledgebase enabling predic-ve systems biology. Powerful modeling framework. Open source Infrastructure for integra8on Resource to enable experimental design and interpreta7on of results.!

14 Big data analytics in Bio à compare many environmental samples 28k shotgun metagenomes Normalized Subsystem level 3 Red=foreground Using KBase matr client

15 NGS provides very high resolution data We can now observe evolution ongoing new strain new strain From: Handley et al, AEM 2014