In search of the Holy Grail?

Size: px
Start display at page:

Download "In search of the Holy Grail?"

Transcription

1 In search of the Holy Grail? Our Clients Journey to the Data Lake André De Locht Sr Business Consultant Data Lake, Information Integration and Governance $ andre.de.locht@be.ibm.com (

2 Data Requirements (should) evolve in line with Business Requirements Maturity Operations BI & Data Warehouse Self Service Analytics New Business Models Business Value Cost Reduction: Modernization: Insight Driven: Transformation: Lower the cost of data management for back-office and operational systems. Accelerate adoption of open source and cloud to take advantage of innovative data technologies. Use self-service to access data and apply advanced analytics to exploit opportunities in every part of the business. Become datadriven and build new applications and launch entirely new businesses models. 2

3 The IBM Approach to accelerate Business Value: Achieve Data Maturity Incorporating new data sources IBM DataWorks Our next generation of data and analytics technology that serves all data professionals to help organizations realize the full promise of data. Embracing open source and cloud Leveraging existing investments DataFirst Method Our expertise is packaged as repeatable patterns to ensure successful client outcomes. It lets clients put data to work advancing from where they are now on the Maturity Model to where they need to be.... so you can create a data-driven culture and drive new value for your organization 3

4 Choice of Collaborative User Experiences, Solution Blueprints, and Individual Services: IBM DataWorks User Experiences Business Analyst Data Scientist Data Engineer App Developer Find Share Collaborate Solution Blueprints Self-Service Analytics Internet of Things Data Lake Mobile Applications Individual Services Data Access Data Recognition Advanced Analytics Powered by Access & Ingest IOT Streaming Data Preparation ETL/ELT Store Hadoop NoSQL/SQL Object Store Governance Analyze & Build Descriptive Predictive Prescriptive Dev. environment Deploy Apps/APIs Data Pipelines Reports Models 4

5 The Data Lake is a reference architecture that balances the desire for easy access to data with information governance and security. The Data Lake reference architecture describes the technical capabilities necessary for a system of insight, while being independent of specific technologies. Being technology independent is important, because most organizations already have investments in data platforms that they want to incorporate in their solution. In addition, technology is continually improving, and the choice of technology is often dictated by the volume, variety, and velocity of the data being managed. A system of insight needs more than technology to succeed. The Data Lake reference architecture includes description of governance and management processes and definitions to ensure the human and business systems around the technology support a collaborative, self-service, and safe environment for data use. 5

6 What is Data Lake? Infuse Forceful Analytics End-to-End value proposition on our Analytic Vision Introducing all capabilities from Databases to Cognitive On premise and Cloud From Architects to analysts & scientists to business users 6

7 Hadoop is not a Data Lake! WE SEE CUSTOMERS CREATING BIG DATA GRAVEYARDS, DUMPING EVERYTHING INTO HDFS AND HOPING TO DO SOMETHING WITH IT DOWN THE ROAD. BUT THEN THEY JUST LOSE TRACK OF WHAT S THERE. Sean Martin, CTO of Cambridge Semantics, 1. Hadoop lacks the governance and security capability that organizations expect from an enterprise-ready platform 2. Hadoop can do large scale batch processing, but is inadequate for interactive questioning Business Intelligence Data Warehouse Discovery 3. Hadoop is difficult, very specialized skills are needed to do the flexible questioning analysts typically perform Hadoop Data Lake (ETL) Operational Data (OLTP) 7 7

8 Governance is first and critical step in building up your Data Lake Data Catalog Key Considerations Fine-grained authorization controls (protect sensitive data) because Governance and change management ensure that information is protected and managed efficiently Business Glossary Data Lineage information (understand source of data often regulatory requirement) Understand all your data (internal, 3rd party, IoT): its origin, purpose, & usage Ensure easy access and usage to all data with user-friendly approach Discover what you need, when it s needed, in context (understand the nature of the data) Ensure compliance, security, and privacy 8

9 Lessons learned (sometimes the hard way ) Gather, Organise & Manage the data Identify what data is required Analyse the data to gain insight Prioritise the insight needed Build processes to exploit data Prioritise exploitation projects 1. Information governance should start small and focus on the most important information, and then expand out as it demonstrates its business value. 2. It must evolve with the business, be responsive and accountable, while seeking to communicate and educate people in the appropriate management of information. 3. Most important, it needs senior stakeholders and visible consequences for those who ignore the requirements. 9

10 What is your Use Case for Data Lake? 10

11 Use Cases from Technical Perspective Data Lake & Logical Warehouse Warehouse Offloading Modernize warehouse architecture through the Data Lake improving efficiency (TCO) and extending analytics warehouse BigIntegrate & BigQuality HDFS Improve efficiency of existing warehouse investments by offloading dark data or augmenting it with sandboxes warehouse BigIntegrate (& BigQuality) HDFS Enhanced 360º view Exploratory Analysis Enhance insight of key business entities (e.g. customer) by integrating and correlating new data sources and building an integrated view MDM BigIntegrate & BigQuality HDFS Discover & explore new insights more rapidly and in a more agile & iterative manner BigIntegrate & BigQuality HDFS 11

12 Data Lake was born in a bank, and since then, applied in all other industries. Business Model Innovation Digital Business & Client engagement Regulatory & Compliance 12

13 13

14 Start your Journey! The IBM DataFirst Method is our expertise to ensure & accelerate your success to get from where your are now, to where you want (need?) to be The more you put data to work in your organization, the better the outcome Ensures client success through repeatable and actionable use cases & engagements Provides an entry point and a roadmap to the future for your journey Start Anywhere Focus on your biggest business opportunity Fill the Gaps Strategy. Expertise. Skills. No more and no less Build Value at Every Step Achieve a data-driven culture, one initiative at a time 14

15 Engagement Flow Briefing Open for Data, DataWorks and DataFirst Method The Holy Grail Discovery Workshop Maturity Cost Reduction: Modernization: Insight Driven: Transformation: Lower the cost of data management for back-office and operational systems. Accelerate adoption of open source and cloud to take advantage of innovative data technologies. Use self-service to access data and apply advanced analytics to exploit opportunities in every part of the business. Become datadriven and build new applications and launch entirely new businesses models. Business Value 15

16 Next step suggestion: 2 day Data Lake Track session Day 1 Governance Deep Dive (align Business/IT, how to enable rapid & independent exploration of secure trusted data) Capability Demonstration (Glossary, Lineage, Authorization) Data Lake Overview Day 2 Identify Business Priority (Use Case definition & prioritization) Architecture deep dive (Understand IT/Data Landscape, gaps & priorities) Business Architects & IT Participants 16

17 Foundation Intermediate Advanced Full Operational Governance Information Supply Chain Maturity Self Service & System of Insight Data Lake Business Capability Business Value Established Culture: Data as an Asset Know where the data is Compliance Error Avoidance Data Ownership Data Quality Rules Agreed & Published Data Quality Policy Agreed & Published Common Standards & Definitions across the enterprise Trusted Information All Data Paradigm Single & Common Metadata repository On-Time & Flexibile Business Analytics Connecting Operational activity With corporate Business Performance Architects Static data (EDW) Business Agility (Self Service) Reporting New Analytical Applications Enterprise Architecture And Standard practices Speed of Insight Hadoop Integration Data in Motion Self-Service Analytics Data Scientist Sandboxes Master Data Management Information Governance Catalog Hadoop Datawarehouse appliance Infrastructure Exploring the Value of Data Predictive Analytics Real time Analytics Information Integration Platform Business Value 17

18 We offer our expertise to avoid Flesh Wounds. 18 Our IBM Analytic capabilities are not focused on Middleware, databases and ETL frameworks, but on well thought-out architectures supported by end-user tools to make data easy.