Genomics and High Performance Computing. Folker Meyer Argonne National Laboratory and University of Chicago
|
|
- Ralph Dickerson
- 6 years ago
- Views:
Transcription
1 Genomics and High Performance Computing Folker Meyer and University of Chicago
2 Brief intro: I am a computer scientist turned computational biologist My CS friends tell me I am a biologist My BIO friends tell me I am a CS person So clearly take my comments with a grain of salt My research interests: Metagenomics to study microbial role in geochemical cycling (Climate, Remediation) human health (Human Microbiome) Strong emphasis on technology development to allow study Algorithms, integrations, tools, genomics tech
3 Why microbes? Source: Rob Knight, U. Colorado
4 Biology changed. From this These are: biology.png, biology.gif and biology.jpg. 4
5 to this. (in ~2003)
6 DNA Technology advances FAST Technology Read length Mbp/hr MBp/run Cost/ run 1977 Sanger ~100 ~ ~0.001? 1987 ABI 370 ~ ~$1,000 ~2001 ABI 3700 ~ $ pyroseq $13, Solexa bp (1) ~ , ,000 $15, ABI Solid 35-50bp ,000-20,000 ~$5, Ion Torrent (2) ~ $ ? Helicos ~50? $20, ? PacBio $100 1) Solexa paired-end now bp 2) Machine cost ~$50k
7 Ongoing change
8 1) Democratization of data generation From factory to bench-top And >70% of Illumina machines go to small customers (1) 1) From Illumina at 2010 GIA meeting
9 2) Data set size Instrument output went from <<1 GB to 200 GB in <5 years month trend looks interesting to put this into perspective: All genomics knowledge ( RefSeq ) was 57GB when I last checked Large Global Ocean Survey study by Craig Venter in 2004 was 600 megabases When Biologist speak of big data if frequently fits on an IPod
10 3) Computing cost dominate $ $900,000 $600,000 $300,000 $240,000 $300,000 $120,000 $30,000 $30,000 $30,000 $30,000 $45,000 Bioinformatics Sequencing $15,000 $15,000 $15,000 $7,000 $3,000 $3, GAIIx HiSeq2000 GB 95GB == 195,600 node hours (on Nehalem 8core, 16GB), Illumina HiSeq2000 = 2x100GB/run available today cost is purely BLASTX (no storage or transfer cost) on Amazon EC2 Source: Wilkening et al., Proceedings IEEE Cluster09, 2009 note: 10x or 100x improvements over BLASTX will help, but not solve
11 What does genomics look like?
12 Example: Analysis of pathogen: B. subtilis
13 Sequencing a genome Where are we? Closing gaps between contigs validating the sequence via a map
14 Genome sequencing is now routine Figures: Nikos Kyrpides DOE Joint Genome Institute, G.O.L.D.
15 Genome sequencing: a success story Genome assembly used to require large machines and significant manual effort More data and novel codes make this much easier Velvet, newbler, AllPaths,.. Bacterial genome sequencing and assembly sometimes in a day While some quality issues still exist complete automation on low-cost machine is likely to happen in the next months
16 A genome: CGGGGGAGCCCTCCAGAATACCCATCATATAGCCCCTGAGGTGGCATGGGATGTCTCCATGAGGGAACCCCTTCCCACTTCATACTGTC ACGTATATCATAGTGTTCTTGACTGGGCCATTCATCTAAGATGGGATTTACCCTGTGAAACAGGGAGAAGACTTATGGACCCCAAGCATCAT TTCAAGTTGAAGTTGAGTTTTTAAAAGCCATCCATGCAAAGTTCCTTTGCTTTGGACCCTCTGCATTATTAAAGCTGCTGTATTGCTAACCC AGAACTGCTCCAGTGTCTTGACTGATCATCATGGCTTCAGTTTGGAAGAGACTGCAGCGTGTGGGAAAACATGCATCCAAGTTCCAGTTTGT GGCCTCCTACCAGGAGCTCATGGTTGAGTGTACGAAGAAATGGTAACCAGATAAACTGGTGGTAGATGAAGACATGCAAAGTTTGGCTAGTT TGGTGAGTATGAAGCAGGCTGACATTGGCAATTTAGATGACTTCGAAGAAGATAATGAAGATGATGATGAGAACAGAGTGAACCAAGAAGAA AAGGCAGCTAAAATTACAGAGCTTATCAACAAACTTAACTTTTTGGATGAAGCAGAAAAGGACTTGGCCACCGTGAATTCAAATCCATTTGA TGATCCTGATGCTGCAGAATTAAATCCATTTGGAGATCCTGACTCAGAAGAACCTATCACTGAAACAGCTTCACCTAGAAAAACAGAAGACT CTTTTTATAATAACAGCTATAATCCCTTTAAAGAGGTGCAGACTCCACAGTATTTGAACCCATTCGATGAGCCAGAAGCATTTGTGACCATA AAGGATTCTCCTCCCCAGTCTACAAAAAGAAAAAATATAAGACCTGTGGATATGAGCAAGTACCTCTATGCTGATAGTTCTAAAACTGAAGC AGAGCTTAGTGATCTGAAGCGGGAGCCTGAACTACAACAGCCTATCAGCGGAGCGTGACAGGTACGTGATGCTAGCTTTTATCAGGCAGCGG TATGCGCGATCAATGCGCGCGGCTATATGATCTGCATGCGGCGCGATTACTCTTCGGAGCTTATTTCTGCGGCGGGCCGGGGGAGCCCTCCA GAATACCCATCATATAGCCCCTGAGGTGGCATGGGATGTCTCCATGAGGGAACCCCTTCCCACTTCATACTGTCACGTATATCATAGTGTTC TTGACTGGGCCATTCATCTAAGATGGGATTTACCCTGTGAAACAGGGAGAAGACTTATGGACCCCAAGCATCATTTCAAGTTGAAGTTGAGT TTTTAAAAGCCATCCATGCAAAGTTCCTTTGCTTTGGACCCTCTGCATTATTAAAGCTGCTGTATTGCTAACCCAGAACTGCTCCAGTGTCT TGACTGATCATCATGGCTTCAGTTTGGAAGAGACTGCAGCGTGTGGGAAAACATGCATCCAAGTTCCAGTTTGTGGCCTCCTACCAGGAGCT CATGGTTGAGTGTACGAAGAAATGGTAACCAGATAAACTGGTGGTAGATGAAGACATGCAAAGTTTGGCTAGTTTGGTGAGTATGAAGCAGG CTGACATTGGCAATTTAGATGACTTCGAAGAAGATAATGAAGATGATGATGAGAACAGAGTGAACCAAGAAGAAAAGGCAGCTAAAATTACA GAGCTTATCAACAAACTTAACTTTTTGGATGAAGCAGAAAAGGACTTGGCCACCGTGAATTCAAATCCATTTGATGATCCTGATGCTGCAGA ATTAAATCCATTTGGAGATCCTGACTCAGAAGAACCTATCACTGAAACAGCTTCACCTAGAAAAACAGAAGACTCTTTTTATAATAACAGCT ATAATCCCTTTAAAGAGGTGCAGACTCCACAGTATTTGAACCCATTCGATGAGCCAGAAGCATTTGTGACCATAAAGGATTCTCCTCCCCAG TCTACAAAAAGAAAAAATATAAGACCTGTGGATATGAGCAAGTACCTCTATGCTGATAGTTCTAAAACTGAAGCAGAGCTTAGTGATCTGAA GCGGGAGCCTGAACTACAACAGCCTATCAGCGGAGCGTGACAGGTACGTGATGCTAGCTTTTATCAGGCAGCGGTATGCGCGATCAATGCGC GCG
17 From genome to information: Annotation Bioinformatics Source: A. Becker, U of Freiburg,Germany
18 Annotation In the old days: Find every possible gene Run every tool known to mankind BLAST, HMMer, Against every known database RefSeq, PFAM, InterPro, KEGG, COG, Have humans interpret the results Several drawbacks: Computationally expensive fixable with $$ Requires lots of FTEs fixable with $$$$ Subjective factors come into play fixable with standards? Still an open debate HTGA
19 Resulting compute requirements bacterial genomes contain ~1000 genes per Megabase BLAST vs NCBI NR search takes >10min per gene annotations often require EC (Enzyme) numbers protein domains (Pfam) help gain confidence genomic context comparison ( Clusters ) = min min min min Clusters and Pfam have highest confidence BLAST viewed as error prone CPU investment and quality are correlated more computing helps most groups can not pay the price Source: Informal survey of ~20 manual annotators
20 Things change. More sequences are being annotated Database grows Human annotator expertise shrinks relatively 100% 50% DNA space relative annotator expertise Sanger time WGS Pyrosequencing 1%
21 Ecosystem is unsustainable As sequencing becomes so cheap Analysis is the bottleneck Community needs a scalable solution Many view standards as the solution: It is unclear to me how standards alone can help this problem As we get more data computes take longer, everything becomes more complex
22 Annotation: Another success story Many expensive solutions exist But there is also a novel approach: RAST server (Aziz et al, BMC Genomics, 2008) Team of CS and Bio experts developed novel approach Subsystem technology Combining domain knowledge from both areas Integrate data curation and annotation using novel approaches requiring far less resources, better accuracy Annotate several genomes in a day on a laptop Server has processed over 12k genomes since 2008 Note: Works only for bacteria Extension to other areas possible Try it:
23 The future.. Bacterial genomics has become easy Larger genomes remain harder But plummeting sequencing cost will help traditional genomics Sequencer output is compressed to a few contigs Via assembly to a fraction of the size Human genome only has 20,000 genes Image what would happen w/o assembly Every sequence a gene The next big thing: Metagenomics
24 Cost per base Source: Rob Knight, UColorado
25 Metagenomics needs the magic wand.. == shotgun genomics applied directly to various environments shotgun metagenomics!= sequencing of BAC clones with env. DNA functional metagenomics!= sequencing single genes (16 rdna) gene surveys What are they doing? Who are they? data
26 Community Structure and Metabolism Gene W. Tyson1, Jarrod Chapman3,4, Philip Hugenholtz1, Eric E. Allen1, Rachna J. Ram1, Paul M. Richardson4, Victor V. Solovyev4, Edward M. Rubin4, Daniel S. Rokhsar3,4 & Jillian F. Banfield1,2 NATURE VOL MARCH
27 Size of flagship projects Date Metagenome Size type 2004 Acid Mine Drainage 76Mbp Sanger 2004 GOS-I 700Mbp Sanger 2009 ANL-Subsurface 12GBp Solexa 2x75PE 2009 JGI-Cow-rumen 17GBp Solexa 2010 JGI-Cow-rumen 255Gbp Solexa 2010 MetaHIT 500GBp Solexa 2010 DeepSoil 70GBp Solexa 180bp (125x2) 2010 HMP 5.7TBp Solexa 100x2
28 Metagenome tasks Metabolic functions Community members Who is doing what binning genes to genomes
29 Sync We are here: 8492 metagenomes from > 500 groups Over 10GB per week (rapid growth) Many centers produce data
30 MG-RAST metagenomics RAST server Data growing over time open access, web based user upload data sets >3500 data submitters Web UI
31 Scaling up an MG-RAST v2 ~3,500 users (data submitters) ~200 daily users (>10 minutes) V2.0 was a typical bioinformatics app ( next slide) Throughput was becoming a major problem approaching 1 GB per week in late 2008 Need a mechanisms allowing more throughput
32 Technology choices (typical for BIO) Tightly integrated system Pleasantly parallel code NFS for data movement central database server Workflow management via SGE Running on ~50 machines locally 40 node Dual-PPC-Cluster shares NFS filesystem with all systems 2x
33 Performance analysis MG-RAST jobs Run time hours large (avg. ~0.1Gb) small A snapshot with little wait time Most time is spent in SIMS Short jobs spend a long time in create jobs Careful analysis of all computations IO to CPU ratio determines suitability of platforms
34 Redesign workflow Enable use of distributed computing platforms Including e.g. BG/P, EC2, Azure and local clusters Enable users to contribute resources Be robust, scalable, fault tolerant Enable replacement of algorithm with more efficient ones Enable support for staged database updates Built a prototype workflow engine Argonne Workflow Environment (A.W.E)
35 A.W.E. AWE SERVER A Work request AWE Client webserver db fna Facebook s Tornado B C fna fna fna result s + SQLalchemy RESTful diverse clients Single analysis operation results in a series of work units Client requests a lease on work, with a timeout Results reported to the server, with failures resulting in lease expiration REST interface scales well and minimizes prerequisites Client can size requests to local computational capacity Tested up to ~500 clients
36 What will the future bring?
37 I lost my crystal ball BUT: A lot more data seems a safe bet A lot more computing is certain The computing will be non coupled codes (pleasantly parallel) I hope for better algorithms More standards ah best-practice
38 M5 -- a metagenomics data sharing infrastructure for a democratized sequencing world M5 = metagenomics, metadata, meta-analysis, models, and metainfrastructure
39 M5 The initial goal Enable similarities in one non-redundant database against: GenBank, RefSeq, KEGG, UniProt, SEED, IMG, EggNogs One computation can be used for all tools Store and transport similarities to avoid recomputing Proposed MTF Metagenome Transport Format v0.1 Benefit: compute once, use in ALL tools
40 M5 the long term goal Establish community best practice It will also lead to the ability to outsource some computing tasks for the community Maintaining large metagenomic data sets will overwhelm all bio computing centers I know Searches against published metagenomes will become impossible if we don t established a curated body of metagenomes
41 (A part of) the proposed M5 Platform define e.g. OLCF, ALCF, TeraGRID, OSG, HPC Reference data set TeraGRID CLOUD export MTF Standards in Genomics SOP/Workflo w repository IMG MG RAST CAMER A USERS (large scale) Xyz.. Environental metagenomics: GOS, Terragenome Microbiota in human health: HMP Many smaller user groups
42 Back to Summary
43 Summary Overall genomics is not limited by lack of cycles Lack of good codes and best practice is more limiting Adjusting to large data Will become important for HPC community to recognize good use of cycles And help avoid stupid computes Remember Bioinformatics
44 Interesting novel questions Which microbes are where on the planet Microbial weather Which genes are where Gene migration patterns Combinations of genes Where do pathways or clusters travel Integration of climate data Predictive models How will X impact the microbes
45 Evolution of computing infrastructure for BIO Abundance of machines Before genomics Early genomics genomics era 2010
46 Why should I care? Microbes determine the climate on the planet! (e.g. Falkowski et al Science 2008) Microbes impact human health (e.g. obesity Turnbaugh et al, Nature 2006) Computations are pleasantly parallel, but there are a lot of them Example: Oil spill, integrate gene inventory data with oil spill patterns
47
Metagenome Analysis With MG- RAST
Metagenome Analysis With MG- RAST Folker Meyer, PhD Argonne National Laboratory and University of Chicago http://metagenomics.anl.gov Palm Springs, March 2013 Acknowledgements Team: Dion Antonopoulos Daniela
More informationLeonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015
Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck
More informationMETAGENOMICS. Aina Maria Mas Calafell Genomics
METAGENOMICS Aina Maria Mas Calafell Genomics Introduction Microbial communities Primary role in biogeochemical systems Study of microbial communities 1.- Culture-based methodologies Only isolated microbes
More informationInfectious Disease Omics
Infectious Disease Omics Metagenomics Ernest Diez Benavente LSHTM ernest.diezbenavente@lshtm.ac.uk Course outline What is metagenomics? In situ, culture-free genomic characterization of the taxonomic and
More informationBioinformatic tools for metagenomic data analysis
Bioinformatic tools for metagenomic data analysis MEGAN - blast-based tool for exploring taxonomic content MG-RAST (SEED, FIG) - rapid annotation of metagenomic data, phylogenetic classification and metabolic
More informationKorilog. high-performance sequence similarity search tool & integration with KNIME platform. Patrick Durand, PhD, CEO. BIOINFORMATICS Solutions
KLAST high-performance sequence similarity search tool & integration with KNIME platform Patrick Durand, PhD, CEO Sequence analysis big challenge DNA sequence... Context 1. Modern sequencers produce huge
More informationBioinformatics and computational tools
Bioinformatics and computational tools Etienne P. de Villiers (PhD) International Livestock Research Institute Nairobi, Kenya International Livestock Research Institute Nairobi, Kenya ILRI works at the
More informationPlant genome annotation using bioinformatics
Plant genome annotation using bioinformatics ghorbani mandolakani Hossein, khodarahmi manouchehr darvish farrokh, taeb mohammad ghorbani24sma@yahoo.com islamic azad university of science and research branch
More informationThird Generation Sequencing
Third Generation Sequencing By Mohammad Hasan Samiee Aref Medical Genetics Laboratory of Dr. Zeinali History of DNA sequencing 1953 : Discovery of DNA structure by Watson and Crick 1973 : First sequence
More informationNext Generation Sequencing for Metagenomics
Next Generation Sequencing for Metagenomics Genève, 13.10.2016 Patrick Wincker, Genoscope-CEA Human and model organisms sequencing were initially based on the Sanger method Sanger shotgun sequencing was
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of
More informationAnalysing genomes and transcriptomes using Illumina sequencing
Analysing genomes and transcriptomes using Illumina uencing Dr. Heinz Himmelbauer Centre for Genomic Regulation (CRG) Ultrauencing Unit Barcelona The Sequencing Revolution High-Throughput Sequencing 2000
More informationAccelerate High Throughput Analysis for Genome Sequencing with GPU
Accelerate High Throughput Analysis for Genome Sequencing with GPU ATIP - A*CRC Workshop on Accelerator Technologies in High Performance Computing May 7-10, 2012 Singapore BingQiang WANG, Head of Scalable
More informationHuman genome sequence
NGS: the basics Human genome sequence June 26th 2000: official announcement of the completion of the draft of the human genome sequence (truly finished in 2004) Francis Collins Craig Venter HGP: 3 billion
More informationGenome Assembly Workshop Titles and Abstracts
Genome Assembly Workshop Titles and Abstracts TUESDAY, MARCH 15, 2011 08:15 AM Richard Durbin, Wellcome Trust Sanger Institute A generic sequence graph exchange format for assembly and population variation
More informationIntroduction to BIOINFORMATICS
Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationBIOINFORMATICS 1 SEQUENCING TECHNOLOGY. DNA story. DNA story. Sequencing: infancy. Sequencing: beginnings 26/10/16. bioinformatic challenges
BIOINFORMATICS 1 or why biologists need computers SEQUENCING TECHNOLOGY bioinformatic challenges http://www.bioinformatics.uni-muenster.de/teaching/courses-2012/bioinf1/index.hbi Prof. Dr. Wojciech Makałowski"
More informationGenome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007
Genome and DNA Sequence Databases BME 110: CompBio Tools Todd Lowe April 5, 2007 Admin Reading: Chapters 2 & 3 Notes available in PDF format on-line (see class calendar page): http://www.soe.ucsc.edu/classes/bme110/spring07/bme110-calendar.html
More informationSequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio
More informationNOW GENERATION SEQUENCING. Monday, December 5, 11
NOW GENERATION SEQUENCING 1 SEQUENCING TIMELINE 1953: Structure of DNA 1975: Sanger method for sequencing 1985: Human Genome Sequencing Project begins 1990s: Clinical sequencing begins 1998: NHGRI $1000
More informationMicrobially Mediated Plant Salt Tolerance and Microbiome based Solutions for Saline Agriculture
Microbially Mediated Plant Salt Tolerance and Microbiome based Solutions for Saline Agriculture Contents Introduction Abiotic Tolerance Approaches Reasons for failure Roots, microorganisms and soil-interaction
More informationHuman Microbiome Project: First Map of the World Within Us. Hsin-Jung Joyce Wu "Microbiota and man: the story about us
Human Microbiome Project: First Map of the World Within Us Immune disorders: The new epidemic Gut microbiota: health and disease Disease Health Human Microbiome Project: The concept of superorganism :
More informationProduct Applications for the Sequence Analysis Collection
Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a
More informationIntroduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME. Peter Sterk EBI Metagenomics Course 2014
Introduction to taxonomic analysis of metagenomic amplicon and shotgun data with QIIME Peter Sterk EBI Metagenomics Course 2014 1 Taxonomic analysis using next-generation sequencing Objective we want to
More informationMate-pair library data improves genome assembly
De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate
More informationFinding Biology in the Human Microbiome. George Weinstock
Finding Biology in the Human Microbiome George Weinstock What s next for the Human Microbiome? George Weinstock Metagenomics Unfolds You are here Setting Up Descriptive Phase Hypothesis Testing Metagenomics
More informationIllumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme
Illumina (Solexa) Current market leader Based on sequencing by synthesis Current read length 100-150bp Paired-end easy, longer matepairs harder Error ~0.1% Mismatch errors dominate Throughput: 4 Tbp in
More informationNext Gen Sequencing. Expansion of sequencing technology. Contents
Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND
More informationSequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es
Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio
More informationGenome Assembly. J Fass UCD Genome Center Bioinformatics Core Friday September, 2015
Genome Assembly J Fass UCD Genome Center Bioinformatics Core Friday September, 2015 From reads to molecules What s the Problem? How to get the best assemblies for the smallest expense (sequencing) and
More informationWelcome to the NGS webinar series
Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics IMBB 2017 RAB, Kigali - Rwanda May 02 13, 2017 Joyce Nzioki Plan for the Week Introduction to Bioinformatics Raw sanger sequence data Introduction to CLC Bio Quality Control
More informationMetagenomic Analysis in Human- Associated Projects
Metagenomic Analysis in Human- Associated Projects Wikimedia Commons Wikimedia Commons Daniel H. Huson Singapore Center for Environmental Life Science Engineering (SCELSE) ZBIT Center for Bioinformatics
More informationMolecular Biology: DNA sequencing
Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationCost Optimization for Cloud-Based Engineering Simulation Using ANSYS Enterprise Cloud
Application Brief Cost Optimization for Cloud-Based Engineering Simulation Using ANSYS Enterprise Cloud Most users of engineering simulation are constrained by computing resources to some degree. They
More informationUSING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS
USING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS Claude SCARPELLI Claude.Scarpelli@cea.fr FUNDAMENTAL RESEARCH DIVISION GENOMIC INSTITUTE Intel DDN Life Science Field Day Heidelberg,
More informationTypes of Databases - By Scope
Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of
More informationPathogenic organisms no thanks: Use of next generation sequencing techniques in risk assessment and HACCP
Pathogenic organisms no thanks: Use of next generation sequencing techniques in risk assessment and HACCP Lisbeth Truelstrup Hansen Professor Microbial Food Safety and Environmental Hygiene Mail: litr@food.dtu.dk
More informationSanger vs Next-Gen Sequencing
Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics
More informationACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes
ACCELERATING GENOMIC ANALYSIS ON THE CLOUD Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia
More informationEnabling reproducible data analysis for metagenomics. eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017
Enabling reproducible data analysis for metagenomics eresearch Africa Conference 2017 Gerrit Botha CBIO H3ABioNet 3 May 2017 Outline 16S rrna analysis Current CBIO 16S rrna analysis setup H3ABioNet hackathon
More information2014 APHL Next Generation Sequencing (NGS) Survey
APHL would like you to complete the Next Generation Sequencing (NGS) in Public Health Laboratories Survey. The purpose of this survey is to collect information on current capacities for NGS testing and
More informationTargeted Sequencing in the NBS Laboratory
Targeted Sequencing in the NBS Laboratory Christopher Greene, PhD Newborn Screening and Molecular Biology Branch Division of Laboratory Sciences Gene Sequencing in Public Health Newborn Screening February
More informationHow much sequencing do I need? Emily Crisovan Genomics Core
How much sequencing do I need? Emily Crisovan Genomics Core How much sequencing? Three questions: 1. How much sequence is required for good experimental design? 2. What type of sequencing run is best?
More informationIntroduction to Microbial Community Analysis. Tommi Vatanen CS-E Statistical Genetics and Personalised Medicine
Introduction to Microbial Community Analysis Tommi Vatanen CS-E5890 - Statistical Genetics and Personalised Medicine Structure of the lecture Motivation: human microbiome Terminology Data types, analysis
More informationIntroduction. Highlights. Prepare Library Sequence Analyze Data
BaseSpace Sequence Hub Genomics cloud computing expands opportunities for biological discovery, making analysis and storage of next-generation sequencing data accessible to any scientist. Highlights Centralized
More informationEngineering Genetic Circuits
Engineering Genetic Circuits I use the book and slides of Chris J. Myers Lecture 0: Preface Chris J. Myers (Lecture 0: Preface) Engineering Genetic Circuits 1 / 19 Samuel Florman Engineering is the art
More informationIon S5 and Ion S5 XL Systems
Ion S5 and Ion S5 XL Systems Targeted sequencing has never been simpler Explore the Ion S5 and Ion S5 XL Systems Adopting next-generation sequencing (NGS) in your lab is now simpler than ever The Ion S5
More informationB I O I N F O R M A T I C S
B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be SUPPLEMENTARY CHAPTER: DATA BASES AND MINING 1 What
More informationAccess to Information from Molecular Biology and Genome Research
Future Needs for Research Infrastructures in Biomedical Sciences Access to Information from Molecular Biology and Genome Research DG Research: Brussels March 2005 User Community for this information is
More informationIon S5 and Ion S5 XL Systems
Ion S5 and Ion S5 XL Systems Targeted sequencing has never been simpler Introducing the Ion S5 and Ion S5 XL systems Now, adopting next-generation sequencing in your lab is simpler than ever. The Ion S5
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More informationELIXIR: data for molecular biology and points of entry for marine scientists
ELIXIR: data for molecular biology and points of entry for marine scientists Guy Cochrane, EMBL-EBI EuroMarine 2018 General Assembly meeting 17-18 January 2018 www.elixir-europe.org Scales of molecular
More informationDe novo genome assembly with next generation sequencing data!! "
De novo genome assembly with next generation sequencing data!! " Jianbin Wang" HMGP 7620 (CPBS 7620, and BMGN 7620)" Genomics lectures" 2/7/12" Outline" The need for de novo genome assembly! The nature
More informationBioinformatics for Proteomics. Ann Loraine
Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data
More informationGenomics AGRY Michael Gribskov Hock 331
Genomics AGRY 60000 Michael Gribskov gribskov@purdue.edu Hock 331 Computing Essentials Resources In this course we will assemble and annotate both genomic and transcriptomic sequence assemblies We will
More informationGenomic Data Is Going Google. Ask Bigger Biological Questions
Genomic Data Is Going Google Ask Bigger Biological Questions You know your research could have a significant scientific impact and answer questions that may redefine how a disease is diagnosed or treated.
More informationNext Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017
Next Generation Sequencing Jeroen Van Houdt - Leuven 13/10/2017 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977 A Maxam and W Gilbert "DNA seq by chemical degradation" F Sanger"DNA
More informationWhole Transcriptome Sequencing/RNA-Seq
Whole Transcriptome Sequencing/RNA-Seq RNA Seq refers to the use of high throughput next genera on sequencing technologies to sequence complementary DNA (cdna) sequences. Successful whole transcriptome
More informationAdvanced Information Systems Big Data Study for Earth Science
Advanced Information Systems Big Study for Earth Science Daniel Crichton, NASA Jet Propulsion Laboratory Michael Little, NASA Headquarters October 29, 2015 Background NASA has historically focused on systematic
More informationAccelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica
Accelerating Your Big Data Analytics Jeff Healey, Director Product Marketing, HPE Vertica Recent Waves of Disruption IT Infrastructu re for Analytics Data Warehouse Modernization Big Data/ Hadoop Cloud
More informationE2ES to Accelerate Next-Generation Genome Analysis in Clinical Research
www.hcltech.com E2ES to Accelerate Next-Generation Genome Analysis in Clinical Research whitepaper April 2015 TABLE OF CONTENTS Introduction 3 Challenges associated with NGS data analysis 3 HCL s NGS Solution
More informationGenome Sequence Assembly
Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:
More informationPreston Smith Director of Research Services. September 12, 2015 RESEARCH COMPUTING GIS DAY 2015 FOR THE GEOSCIENCES
Preston Smith Director of Research Services RESEARCH COMPUTING September 12, 2015 GIS DAY 2015 FOR THE GEOSCIENCES OVERVIEW WHO ARE WE? IT Research Computing (RCAC) A unit of ITaP (Information Technology
More informationA shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter
A shotgun introduction to sequence assembly (with Velvet) MCB 247 - Brem, Eisen and Pachter Hot off the press January 27, 2009 06:00 AM Eastern Time llumina Launches Suite of Next-Generation Sequencing
More informationBIOINFORMATICS IN AQUACULTURE. Aleksei Krasnov AKVAFORSK (Ås, Norway) Bergen, September 21, 2007
BIOINFORMATICS IN AQUACULTURE Aleksei Krasnov AKVAFORSK (Ås, Norway) Bergen, September 21, 2007 Research area Functional genomics of salmonids Major in diseases, stress and toxicity Experience is in -
More informationThe IBM Reference Architecture for Healthcare and Life Sciences
The IBM Reference Architecture for Healthcare and Life Sciences Janis Landry-Lane IBM Systems Group World Wide Program Director janisll@us.ibm.com Doing It Right SYMPOSIUM March 23-24, 2017 Big Data Symposium
More informationSurviving the Life Sciences Data Deluge using Cray Supercomputers
Surviving the Life Sciences Data Deluge using Cray Supercomputers Bhanu Rekepalli and Paul Giblock National Institute for Computational Sciences The University of Tennessee Knoxville, TN, USA bhanu@utk.edu,
More informationWhat are Supercomputers Good For?
What are Supercomputers Good For? Katherine Riley ALCF Director of Science Argonne National Laboratory 10 May 2016 DOE S OFFICE OF SCIENCE COMPUTATION USER FACILITIES DOE is leader in open High-Performance
More informationCDC s Advanced Molecular Detection (AMD) Sequence Data Analysis and Management
CDC s Advanced Molecular Detection (AMD) Sequence Data Analysis and Management Scott Sammons Technology Officer Office of Advanced Molecular Detection National Center for Emerging and Zoonotic Infectious
More informationThree-Way Comparison and Investigation of Annotated Halorhabdus utahensis Genome
Three-Way Comparison and Investigation of Annotated Halorhabdus utahensis Genome Peter Bakke, Nick Carney, Will DeLoache, Mary Gearing, Matt Lotz, Jay McNair, Pallavi Penumetcha, Samantha Simpson, Laura
More informationOverview of Health Informatics. ITI BMI-Dept
Overview of Health Informatics ITI BMI-Dept Fellowship Week 5 Overview of Health Informatics ITI, BMI-Dept Day 10 7/5/2010 2 Agenda 1-Bioinformatics Definitions 2-System Biology 3-Bioinformatics vs Computational
More informationHiSeqTM 2000 Sequencing System
IET International Equipment Trading Ltd. www.ietltd.com Proudly serving laboratories worldwide since 1979 CALL +847.913.0777 for Refurbished & Certified Lab Equipment HiSeqTM 2000 Sequencing System Performance
More informationBest Practices for Implementing SAP BusinessObjects Mobile in Your Organization
Best Practices for Implementing SAP BusinessObjects Mobile in Your Organization SESSION CODE: 1106 Viswanathan Ramakrishnan (Vishu) - Oct, 2011 2011 SAP AG. All rights reserved. 1 Agenda INTRODUCTION TO
More informationHTCaaS: Leveraging Distributed Supercomputing Infrastructures for Large- Scale Scientific Computing
HTCaaS: Leveraging Distributed Supercomputing Infrastructures for Large- Scale Scientific Computing Jik-Soo Kim, Ph.D National Institute of Supercomputing and Networking(NISN) at KISTI Table of Contents
More informationIntegrating MATLAB Analytics into Enterprise Applications
Integrating MATLAB Analytics into Enterprise Applications David Willingham 2015 The MathWorks, Inc. 1 Run this link. http://bit.ly/matlabapp 2 Key Takeaways 1. What is Enterprise Integration 2. What is
More informationGlossary of Commonly used Annotation Terms
Glossary of Commonly used Annotation Terms Akela a general use server for the annotation group as well as other groups throughout TIGR. Annotation Notebook a link from the gene list page that is associated
More informationThe Rise of Engineering-Driven Analytics
The Rise of Engineering-Driven Analytics Roy Lurie, Ph.D. Vice President Engineering, MATLAB Products 2015 The MathWorks, Inc. 1 The Rise of Engineering-Driven Analytics 2 The Rise of Engineering-Driven
More informationGene Regulation Solutions. Microarrays and Next-Generation Sequencing
Gene Regulation Solutions Microarrays and Next-Generation Sequencing Gene Regulation Solutions The Microarrays Advantage Microarrays Lead the Industry in: Comprehensive Content SurePrint G3 Human Gene
More informationGenome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias
Genome Sequencing I: Methods MMG 835, SPRING 2016 Eukaryotic Molecular Genetics George I. Mias Department of Biochemistry and Molecular Biology gmias@msu.edu Sequencing Methods Cost of Sequencing Wetterstrand
More informationHigh Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015
High Throughput Sequencing Technologies UCD Genome Center Bioinformatics Core Monday 15 June 2015 Sequencing Explosion www.genome.gov/sequencingcosts http://t.co/ka5cvghdqo Sequencing Explosion 2011 PacBio
More informationThe world leader in serving science. DataSafe Solutions. Protect your valuable laboratory data
The world leader in serving science DataSafe Solutions Protect your valuable laboratory data Central and secure storage of laboratory data is critical to the success of your organization. The ability to
More informationBerkeley Data Analytics Stack (BDAS) Overview
Berkeley Analytics Stack (BDAS) Overview Ion Stoica UC Berkeley UC BERKELEY What is Big used For? Reports, e.g., - Track business processes, transactions Diagnosis, e.g., - Why is user engagement dropping?
More informationIntroduction to Next Generation Sequencing (NGS)
Introduction to Next eneration Sequencing (NS) Simon Rasmussen Assistant Professor enter for Biological Sequence analysis Technical University of Denmark 2012 Today 9.00-9.45: Introduction to NS, How it
More informationFunctional analysis using EBI Metagenomics
Functional analysis using EBI Metagenomics Contents Tutorial information... 2 Tutorial learning objectives... 2 An introduction to functional analysis using EMG... 3 What are protein signatures?... 3 Assigning
More informationFinding the LIMS of Your Dreams
Finding the LIMS of Your Dreams A Practical Guide for the Next Generation Sequencing Lab Today, the technologies and methods pioneered during the Human Genome Project have revolutionized the life-science
More informationWhy learn sequence database searching? Searching Molecular Databases with BLAST
Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results
More informationUltrasequencing: Methods and Applications of the New Generation Sequencing Platforms
Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms Laura Moya Andérico Master in Advanced Genetics Genomics Class December 16 th, 2015 Brief Overview First-generation
More informationCloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing
Protein Cell 2012, 3(2): 148 152 DOI 10.1007/s13238-012-2015-8 RESEARCH ARTICLE CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing Guoguang Zhao 1,4*, Dechao Bu 1,4*,
More informationINTRODUCTION A clear cultivation bias exists in microbial phylogenetics. As of 2010, half of all
Rachel L. Harris 1 Insights into the phylogeny and coding potential of microbial dark matter: a replication of phylogenetic anchoring methods described by Rinke et al., 2013 Rachel Harris, David Zhao,
More informationMicroSEQ Rapid Microbial Identification System
MicroSEQ Rapid Microbial Identification System Giving you complete control over microbial identification using the gold-standard genotypic method The MicroSEQ ID microbial identification system, based
More informationA Cloud Migration Checklist
A Cloud Migration Checklist WHITE PAPER A Cloud Migration Checklist» 2 Migrating Workloads to Public Cloud According to a recent JP Morgan survey of more than 200 CIOs of large enterprises, 16.2% of workloads
More informationDe Novo Assembly of High-throughput Short Read Sequences
De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,
More informationApplication for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick
Application for Automating Database Storage of EST to Blast Results Vikas Sharma Shrividya Shivkumar Nathan Helmick Outline Biology Primer Vikas Sharma System Overview Nathan Helmick Creating ESTs Nathan
More informationNext Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms
Next Generation Sequencing Lecture Saarbrücken, 19. March 2012 Sequencing Platforms Contents Introduction Sequencing Workflow Platforms Roche 454 ABI SOLiD Illumina Genome Anlayzer / HiSeq Problems Quality
More informationOracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2
Oracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2 O R A C L E W H I T E P A P E R J A N U A R Y 2 0 1 5 Table of Contents Disclaimer
More information