Next Generation Bioinformatics on the Cloud http://www.easygenomics.com Sifei He Director of BGI Cloud hesifei@genomics.cn Xing Xu, Ph.D Senior Product Manager EasyGenomics BGI xuxing@genomics.cn Contact Us info@easygenomics.com
Agenda Vision and Strategy Problems and Solutions Product Introduction LIVE Demo Future Roadmap Q&A
Trend of Volume and Cost $/Mb D N A S e q u e n c e Human Genome Sequenced Figures adapted from Sboner A, et al.: The real cost of sequencing: higher than you think! Genome Biology 2011, 12:125 Numbers and Images from private research and the open Internet 3
Geological side of the problem + Sequencing is a COMMODITY and happens EVERYWHERE. BGI Images from omicsmaps.com
Interpretation is the KEY Analysis and Interpretation is the KEY Application is the Silver Bullet
Difficulties of Analysis Primary analysis Secondary Analysis Tertiary Analysis Post Tertiary Analysis Base calling Mapping Variant Calling In-depth Annotation Data throughput Data storage Computation intensive Data storage Complicated Algorithms Computation intensive Lack of knowledge
Problems and Solutions Solutions Cloud High Speed Data Exchange Workflows +) Resource Management Problems: Big genomic data Geological distribution Algorithm integration Computational demand 7
EasyGenomics EasyGenomicsis the bioinformatics platform for research and applications on the cloud
EasyGenomics Database, Data management Computational Resources Algorithms, Workflows, Reports High speed connection Web portal, Simple UI EasyGenomicsis the bioinformatics platform for research and applications on the cloud
Bioinformatics Core Algorithms: Carefully chosen, tested and optimized Workflows: Whole genome resequencing, exome resequencing, RNA-Seq, small RNA, de novo Assembly
Enabling Technology Hadoop-based Flexible Computing Best Practice Award for IT Infrastructure Human Genome SOAPdenovo EasyGenomics TM (192 cores) Genome Coverage 86% 86% Assembly Time 70h 55h No. of Servers 1 15 Memory Size 500GB x 1 24 GB x 15 Mode Centralized Distributed 11
Data Management Sample A Analysis I Analysis II Raw Data Sample B Analysis X Project I Sample, Analysis, Project Mimicking real research procedure Automatic management of underlying data structure
High Speed Data Exchange Aspera spatented fasp high-speed file transferring technology 10~100X fasterthan FTP 13
Resource Management Managed Multitenancy Workspace Data Structure Managed Task Safe Backup
Security Access Multitenancy Isolation Compliance Username/Password Biometric access HTTPS, Asperafastp TM Trusted database connection ACL, Data encryption Physical isolation Virtual isolation ISO27000
Introduction to EasyGenomics TM Xing Xu, Ph.D Senior Product Manager
Homepage Navigation Tabs Three task portals Status of recent works Warning and Logging
Project Table Add/Remove Project Project list table Filter and search box Operation short cuts
Analysis Table
Sample Table
Read Upload Read Upload Portal
Upload Raw Data Create a Sample
Upload Raw Reads (Asperaconnect server)
Create a sample Create a Sample
Create a Sample Sequencing information Add Read Group Filter settings Mapping settings
Add read groups Create a Sample
Sample Page Individual report for each lane Summarized report for all lanes
Sequencing Quality Report 28
Mapping Report 29
Data Analysis Portal Create a Analysis
Create an Analysis
Create an Analysis Selected sample(s) One selected sample => Single Analysis Multiple selected samples => Batch Analyses
Create an Analysis Selectable modules Predefined Settings Shortcut
Create an Analysis
Create an Analysis Customizable
Create an Analysis
Create an Analysis
Data Harvest Portal Data Management
Upload Management
Download Management
LIVE DEMO
Sifei He Director of BGI Cloud 42
Applications Complex -Omicsresearch Genetic testing Diagnostics
One More Thing FREE Ref: BOSTON Please Visit BGI Booth @ 213 Subject to T&C
Q & A 45
BACKUP 46