Big Data Yesterday, Now and Tomorrow. Data in Research has grown considerably in the past few years. Social Media. 400 mn. tweets per day 1 162,632

Size: px
Start display at page:

Download "Big Data Yesterday, Now and Tomorrow. Data in Research has grown considerably in the past few years. Social Media. 400 mn. tweets per day 1 162,632"

Transcription

1 Transforming Big Data into Big Value Sep 18, 2013 Speaker Thomas Kelly Practice Director Enterprise Information Management Cognizant Technology Solutions, Inc. 2013, Cognizant

2 Big Data Yesterday, Now and Tomorrow Data in Research has grown considerably in the past few years Biological Laboratory 25-50M terms in elab Notebook pdfs for typical large pharma Social Media 400 mn tweets per day 1 162,632 tweets on asthma in just the last 10 months Healthcare Doubled Doctors and hospitals use of health IT since

3 Many of the Opportunities are not new but some are Traversing from Data to Knowledge continuum is not a new challenge in Life Sciences Dealing with complex, dynamic, large and rapidly-growing data sets is not new either Genome sequence Number of Base pairs Omics data High Throughput Screening Next Generation Sequencing Volume Velocity Chemical structures Gene expression data Microarray data Realtime data from social networks External data from EHR Variety Complexity Pipeline analysis Computational Modeling Statistics Our primary focus has, however, been on managing and analyzing data individually 3

4 Also New is a Set of Tools to Tackle the Challenge Open source Distributed Processing Frameworks Big Insights & Streams Big Data Appliance HANA Big Data Analytical Applications Packaged Big data platforms MPP Data appliance Platforms Data Visualization, Statistical & Inmemory Analytics Big data Integration Translational Research specific tools 4 TRANSLATIONAL RESEARCH CENTER

5 including Semantic Technology to enrich Big Data with Insights and Expertise 5

6 Example: Type 2 Diabetes Research using Semantic Technology Mayo Clinic used Semantic Web technologies to develop a framework for high throughput phenotyping using EHRs to analyze multifactorial phenotypes 1 Mapped Clinical Database to Ontology Model Diseasome DBPedia ChemBL Find Genes or Biomarkers associated with T2D, as Published in the Literature 4 2 RxNorm DailyMed Clinical DB Diseasome RxNorm ChemBL DrugBank Clinical DB 5 Find All FDA-approved T2D Drugs; Find All Patients Administered these Drugs Selected Genes have Strong Correlation to T2D. Find All Patients Administered Drugs that Target those Genes. 3 RxNorm SIDER Clinical DB Diseasome RxNorm ChemBL DrugBank Clinical DB 6 Find Which of these Patients are having a Side Effect of Prandin Find All Patients that are on Sulfonylureas, Metformin, Metglitinides, and Thiazolinediones, or combinations of them 6 Reprinted with permission from Jyotishman Pathak, Ph.D., Mayo Clinic

7 Is the Juice worth the Squeeze? ACOs Data Marketers Cost Containment Cost Reduction through better Trial Design & Execution Cost Avoidance through Better Patient and Study Selection, Retention & Adherence Improved Patient Outcomes Personalized Medicine is more attainable and affordable New insights into Disease & Mode of Action Improving Regulatory Compliance Reduced effort for compound screening and competitor intelligence Improved Trust through Data Traceability Payers Providers 7 Device Manufacturers Faster Time to Analysis Results Reduced the time required to conduct gene-environmental interactions analysis by 99 percent, from over 25 hours to under 12 minutes 3 Regulators

8 Overcoming Barriers and Getting Started People, Data, Technology & Delivery

9 Driving Big Value Create Communities of Interest Select Area of Focus Define Value Objectives Plan and Execute Measure and Publish Results 9

10 Build an Environment for Success Interest Business Stakeholders Executive Leadership Integrate new and existing data to rapidly stimulate new insights about customers, products, and markets Champions Advisors Business Experts Technology Experts Benefits Owners Create opportunities for products and services that transform the organization s role and position in the marketplace Information Technology Partners 10 Extend the footprint of existing technology assets; reduce the overall cost of operations; eliminate high cost, low value infrastructure Create value that cannot be achieved alone

11 Big Data Opportunities in Pharma R&D Focus Drug Discovery Clinical Development Drug Safety Regulatory Genomic Technologies R&D Business Development New Market Identification Disease & Mechanism of Action Predictive Sciences Translational Medicine Imaging Regulatory Monitoring Competitor- Compound Profiling Drug Repositioning Investigator Selection & Profiling Patient Selection Safety Reporting from Social Media Healthcare Data Mining 11

12 Business Value Big Data Focus in Pharma R&D Focus High Innovation Enablers (Improved Patient Outcome) Predictive Sciences Translational Medicine Genomic Technologies Operational Excellence (Cost containment) Drug Repositioning Investigator Performance & Patient Selection Mine Healthcare Data R&D Process Context (Compliance) Safety Reporting from Social Media sources Regulatory Monitoring Compound Profiling Low Maturity High 12

13 Define Your Value Objectives Objectives Establish clear success criteria and SMART metrics (Specific, Measurable, Attainable, Realistic, and Traceable) to prove ROI Revenue enhancement (increase revenues by $5M in the first y months) Cost reduction (source, commitment) Operational efficiency (reduce analytics cycle time by 90%) Increase market share Scale Globalize a local activity, capability, or product Collaboration between business and IT Prioritizing benefits realization 13

14 Big Data Strategy Execute Business Strategy Data Strategy Technology Strategy Delivery Model Establish Governance Model New Business Models & Organisational Impact Include Data Access, Integration, Quality and Curation & Analytics Identify Service Provision across Data, Analytics & Technology Include R&D platforms, Big Data strategy, Analytics & Visualization Robust servicesbased delivery model Include Experimentation approach using Lab-on-hire 14

15 Business Strategy Data Strategy Technology Strategy Delivery Model Create a Data-Driven Focus Warning Letters Identify Patient Population Geography Patent Inspection Sentiment Performance Metrics Rare Diseases Disease of Interest Social Media Unmet Need Peer Reviews Expert? (based on confidence) Publication Journal Collaboration Unmet Need Conferences Key Opinion Leader Investigators Therapeutic Areas Clinical Trials Research Focus Geography Current Collaboration Emerging Countries Academia/Pharma/ Biotech? Working with competitors? BRICS Identify Patient Population Research Focus Clinical Trials China 15 KOLs working on DPP IV inhibitors, based in emerging markets with positive performance metrics and publications in journals, conferences and social media

16 Business Strategy Data Strategy Technology Strategy Delivery Model Health Data Integration using Semantic Technology Intelligent Health Health Data Data Exchange Integration Technology Technology Stack Stack on Semantic Technology Expert Knowledge PRM CDISC CDASH ODM SDTM ADaM Patient Behavior Data Entity Resolution SHARE SEND Data Virtualization Patient Privacy Data Federation Linked Data Nutrition Data Lifestyle Data CDA CCD HL7 QRDA GELLO Provenance 16 Epidemiology Data RIM CCOW ICSR SPL

17 Business Strategy Data Strategy Technology Strategy Delivery Model Technology Reference Architecture Linked Data Ontology Models Data Virtualization and Federation Inferencing and Embedded Expertise Natural Language Processing Semantic Technology Integration Source Systems Data Acquisition Channels Data Integration and Quality Hub Data Storage and Repository Databases ODS/Staging EDW Data Marts Files Standard Interface for Database [JDBC] CDC Engine (Optional) Web Services External Data / RWE e.g. Thomson Reuters i3 InVision Wolters Kluwer GPRD Standard Interface for Files [FTP/SFTP/CP/RCP] Standard Interface for Web Services [SOAP/WSDL] Data Audit and Certification Data Security Hub Data Delivery Hub Sqoop/ Java Programs Data Extract Jobs Map Reduce Processing Routines ODBC Pull Through Web Services Sqoop Data Control Access Published Reports Subject Area Specific Marts Adhoc Reports Data Governance Innovation Services Technology Services 17 Automation Tools

18 Business Strategy Data Strategy Technology Strategy Delivery Model Experimental Evaluation Model New Opportunity Data Sources New Technologies New Data Sources New Stakeholders New Processes Review scale up potential Generate idea Enumerate opportunity Technical assessment Refine opportunities as needed Review Design Concept Go/No Go Decision Pilot created Users informed Production project formed Performance optimization Additional requirements Business process redesign, if needed Training and roll out Review Design Concept Go/No Go Decision Pilot created Users informed 18

19 Leverage Insights and Expertise, Rapidly and Sustainably Execute Identify and leverage existing, relevant data assets and expertise Reuse Expertise Analyze Ingest new data sources (light integration and curation) Monitor and measure use and benefits achieved; identify next set of priorities Realize Benefits Extend Create and extend data relationships, leveraging insights from previous study cycles 19 Elevate study-proven data, relationships and expertise to organizationwise definition Govern Refine Capture insights from new study cycles, refining relationships to support new analyses

20 Example #1: Epidemiology Analytics and Patient Cohort Analysis at Global Pharma Results MarketScan I3 Invision DataMart Business Need De-identified patient data is provided by third party data providers Datasets can range from 500 GB to 2-3 TB SAS analysis can take more than 10 hours due to the complexity of the processing. Preparation of the control and analytic datasets can take up to several days Solution Hadoop-based solution developed to leverage its parallel processing capabilities Pig used for converting the datasets from multiple providers into a common format Python used for applying the algorithms for the cohort analysis Analysis results stored in Hive for querying and analysis using SAS Use of HBase and Solr for fast search Benefits Understanding of prevalence of secondary conditions Better understanding of disease market Improved trial design Real time search of over million records in 2.5 seconds Reduced processing time of Epidemiology analytics to 20 minutes 20

21 Example #2: Investigator Performance and Selection Analysis Results Business Need Solution Benefits Assess performance based on FDA inspections (10-20,000 unstructured documents) Identify and select investigators and sites across various geographies having experience in specific therapeutic areas Extracts information from FDA inspection reports Auto-categorization results based on performance Provide summary for users to review Selection of potential investigators based on integration with Clinical Trials.gov and existing investigator database Identified high performing existing investigators Plan additional sites visits Quick start new campaigns 21

22 Example #3: Building a KOL Network Results Business Need Solution Benefits Build a network of high performing investigators and partners to improve trial performance and establish thought leadership Be on the cutting edge of science and identify new focus areas Early to market Semantic integration of data from external and internal sources Manual curation and delivered as actionable insights Monitor new trends and provide alerts and dashboards Assign a confidence level to each of the elements being tracked Data mart that will enable complex analytics and visualization Planned new market entry Identified partners for rare diseases in new/existing markets Quick start clinical trials with a master list of investigators Tracked and profiled new/existing partners 22

23 23 Industrializing Your Big Data Project Outcome Transforming an innovation project into a repeatable, sustainable, and valueproducing participant in your business processes Build Industrialization Support with Stakeholder Community Present Achieved Benefits, Manage Expectations, and Update Goals Analyze Verify End User Expectations and SME Requirements Elaborate/Validate the Business Context Refine Project Goals and Value Objectives Evaluate Technologies (Performance, Process Automation) Data Provisioning and Organization, including New and Additional Data Sets Reuse Opportunities Extending the Solution to a Larger Audience of Users Sun-Setting Opportunities Additional Cost Take-out Align Verify Data Set Quality Processes Catalog and Share Data Achieve Build and Verify Repeatable Process(es) Educate and Support the Users of the New Process(es) Regularly Measure and Report Achieved Benefits

24 Achieving Big Value by Transforming the Customer Experience

25 Enhance the Customer Experience Ingestible chips will help manage Heart Failure, Central Nervous System Conditions, Transplants 25% of all heart failure patients re-admitted within 30-days due to complications and difficulty following challenging care regimens Have You Taken Your Chip Today? digital medicines will help heart failure patients stay in control, in better communication with their clinicians 4 Digital medicines will provide care givers and pharma with more insight into how the patient is assimilating and responding to their medication 25

26 Take an Active Role in the Customer Experience Smart Toothbrush and Digital Mirror Fifty six percent of companies are making digital engagement of customers a top strategic priority, and linking this to high projected returns

27 We are at an Inflection Point at which Value is Created or Destroyed 27 Source : The Motley Fool

28 Meaning Makers are Emerging Meaning Makers combine data and analytics to tell a story, and then apply that story to business decisions Of the 300 firms studied 26% are Meaning-Makers 50% are Data Explorers 24% are Data Collectors, (lagging significantly) Meaning Makers Significant data integration Value attributed to analytics Self report they are ahead of industry peers Image: Joan M Mas; 28

29 Meaning Makers Get Economic Benefit 11.3% Boost in revenue 10.7% Reduction in cost Over the past year That s 9.9% more than Data Collectors. Cognizant study done with Oxford Economics,

30 Analytics drives both Cost Containment and Revenue Uplift across Industries 30

31 Focus on the Process and How Your Product Engages the Customer Find the most squeaky wheels within your process anatomy Think Design Look for processes that shape >20% of cost or revenue Count Build Redefine moments of engagement (internal and customer-facing) Run Sell 31 See Build A Modern Social Enterprise To Win In The 21st Century, 0the%2021st%20Century.pdf.

32 Thank you 2013, Cognizant

33 References 1: 2: 3: Substantial data analysis improves gene-environmental correlation identification to help develop new treatment for multiple sclerosis, State University of New York (SUNY) Buffalo 4: 5: 33

34 Speaker Thomas (Tom) Kelly Practice Director, Enterprise Information Management, Cognizant Thomas is a Practice Leader in Cognizant s Enterprise Information Management (EIM) Practice, with over 30 years of experience, focusing on leading Data Warehousing, Business Intelligence, and Big Data projects that deliver value to Life Sciences and related health industries clients. Thomas.Kelly@cognizant.com 2013, Cognizant