Creating an Enterprise-class Hadoop Platform Joey Jablonski Practice Director, Analytic Services DataDirect Networks, Inc. (DDN)
Who am I? Practice Director, Analytic Services at DataDirect Networks, Inc. 3+ years with Hadoop, 12+ with HPC Contact Details @jrjablo jjablonski@ddn.com/jrjablo@gmail.com www.linkedin.com/in/joeyjablonski 2
Why Hadoop? Scalable Performance & Capacity Growing Ecosystem (Flexibility) Established APIs & Interfaces Location on the adoption curve Proven base to create Analytical Platforms 3
What is Enterprise Class? Scalable OPEX & CAPEX Manageable Integration with existing tools Flexible Workflow Process Integration No Rip & Replace Metrics to manage towards Business Driven, Technological Capabilities 4
The Big Data Challenge The Big Data Equation: Volume Velocity Variety + + Petabytes of Data Trillions of Objects GB/s TB/s Millions of IO/s Object Operations Structured Unstructured Streams & Batches
Analytics Looking for Actionable Information Billions of Data Points to Consider Consumer purchasing trends Product perception Drug Discovery Genomics Surveillance Financial Analysis
Data Gravity Applications DATA Services 7
Why is data Analytics so hard? Technical Business Hacking Skills Business Acumen Math & Statistics knowledge Data Science Traditional Research Substantive Expertise Communications Analytics Poor Decisioning Curiosity
What is Hadoop missing today? Active-Active high-availability Established management tools Enterprise integration mindset Enterprise class hardware Consistent version-compatibility & deployment Efficient CAPEX & OPEX scaling Resource management/slas/qos Security. 9
Hadoop Operational Considerations Deploy Upgrade Manage Respond Monitor Software Platform Hardware Platform
Todays Enterprise Picture The Cloud 11
Getting there. Improved Results Insight Modify Behavior
Hadoop Architectural Considerations 13
Planning for Growth Adoption Higher is Better Goal for Human Costs Capacity Performance Scalability User Growth 14
Shared v. Commodity Shared Component Approach Lower Operational Costs Efficient operational resource scaling Shared resources with other IT platforms Efficiency in computing, connectivity & service placement Commodity Server Approach Lower Entry Costs Shorter MTBF Inefficient scaling of tools and processes Mis-match with traditional IT operations models 15
Ethernet v. Infiniband Infiniband 100% Storage Management Offload End-End InfiniBand Networking with RDMA Acceleration Real-Time Data Delivery to Provide MapReduce Process Consistency Smaller Compute, Compact Storage to Minimize Data Center Impact Ethernet Compatibility, ensured connectivity Limitations in traffic types and bandwidth availability High CPU/Overhead cost Minimal options for offloading with Linux environments 16
Analytic User Types Empowered Users Aware Users Enabled Users 17
Hadoop Enterprise Integration Monitoring & Response Extract Transform Load APIs Integration Data Information Insight Results 18
And finally, Hadoop is more then just hardware, It is about an ecosystem of hardware & software. about integrating with existing systems. a toolkit to build Analytical Platforms. a component of the larger corporate processes and mandates. a component of the wider business KPIs. 19
Q&A 20