30 Minutes Overview of Data Science for Business

Size: px
Start display at page:

Download "30 Minutes Overview of Data Science for Business"

Transcription

1 Break Down of 90 Minutes 30 Minutes Overview of Data Science for Business 30 Minutes Team Discussion Use Cases 25 Minutes Sharing 1

2 IBM Analytics Academy Data Science is Reinventing Business Carlo Appugliese Program Director Machine Learning and AI IBM Analytics, Data Science Elite Team 2

3 Data Science in Every Day Life Personalization Customize the offers based on preferences Dynamic Pricing Pricing changes based on market conditions Fraud Detection Detect credit card misuse

4 Understanding Data Science? Business Analytics Machine Learning Artificial Intelligence Deep Learning Same goal: Extracting information from data

5 AI and Machine Learning is transforming business AI

6 AI Adoption is growing 51% 53% Q.A13: What are your firm's plans to use the following analytics technologies? (Artificial intelligence) 40% 29% 23% 19% 25% 20% 20% 6% 7% 8% Don't know Not interested/no immediate plans Planning to implement within the next 12 months Implementing, implemented or expanding * Base: 2094, 2106*, data and analytics decision-makers Source: Forrester Data Global Business Technographics Data And Analytics Survey, 2016, 2017, 2018

7 Understanding AI and Machine Learning Data Mining Big Data Artificial Intelligence Machine Learning Deep Learning AI is simulation of intelligent human behavior in computers Machine Learning is the science of getting computers to act without being explicitly programmed Deep Learning is a subset of Machine Learning that leverage neural networks

8 Top 10 Technology Trend in 2018 AI Foundation Trend No.1 : AI Foundation The ability to use AI to enhance decision making, reinvent business models and ecosystems, and remake the customer experience will drive the payoff for digital initiatives through Narrow AI, consisting of highly scoped machinelearning solutions that target a specific task with algorithms chosen that are optimized for that task, is where the action is today. Enterprises should focus on business results enabled by applications that exploit narrow AI technologies and leave general AI to the researchers and science fiction writers"

9 Machine Learning Main Technology Driver for Business Innovation of the Future 2018 Business Innovation Calculator Spreadsheet SQL Machine Learning Technology Drivers IBM 1960s Digital IBM 1980s Desktop IBM 1990s RDBMS IBM 2010s Open Source Machine Learning Next 2 to 5 Years 9

10 Machine Learning Fundamentals 1 Identify Patterns not recognizable by humans 2 Build Models from those patterns 3 Make Predictions With the deployed models

11 Machine Learning: Loan Approval Example Historical Loans Output John X credit_score=800 age= 25 income=$900,000 Target Not Default Train Algorithm Loan approval Model works in Oil & Gas

12 Machine Learning: Loan Approval Example New Applicant Output James X credit_score=800 age= 25 income=$900,000 works in Oil & Gas Loan approval Model Approve Reject

13 Data Science Impact on Business One use case at a time We cannot solve our problems with the same thinking we used when we created them. Albert Einstein A single use case in large Insurance Company, Built a simple ML model that cost about 150K, But in production it saved the company 15 Million annually..

14 Data Science Use Case Examples

15 Predict Fraudulent Online Banking Activity OVERVIEW Currently uses a decision-rule based system to flag suspicious transactions for review by fraud responders High false positive rate, low false negative rate Missed fraudulent activity is costly Large volume of alerts places a burden on responders OBJECTIVE Use supervised machine learning to predict fraudulent activity within bank's mobile banking system CHALLENGE Fraudulent activity is very rare relative to all online banking activity ~500M actions/ month Predictors need to be accepted by fraud team FIRST MODELING APPROACH Augment existing system by predicting which alerts on individual activity are correct CURRENT SYSTEM False Positives WITH AUGMENTATION 48% 94% CURRENT SYSTEM False Negatives WITH AUGMENTATION 4% 7% SECOND MODELING APPROACH Predict which user sessions are fraudulent within first 10 seconds False Positives CURRENT SYSTEM ML MODEL 95% 85% False Negatives CURRENT SYSTEM ML MODEL 17% 6% 15

16 Existing Rule-Based Fraudulent Alert System Online banking activity occurs Yes Rule-based system generates alert? (suspected fraudulent transaction) No Decision rules updated Responders in Forensic Services investigates alert Yes 94.4% Alert is a false positive No 5.6% Yes 4% of all fraudulent transactions Actually a fraudulent transaction? No Decision rules updated Alert closed Actually a fraudulent transaction? Action taken Fraud analysts investigates and may modify decision rules No further action Yes No Fraud analysts investigates and may modify decision rules No further action IBM Analytics Data Science Elite

17 CASE STUDY Wealth Management firm improves straight through processing with machine learning USE CASE Wealth Management looking for help to apply ML to improve STP (straight through processing) by: Identifying trade partners that have high failures Provide details to NT clients about their trade partners Predict STP issue and time to resolution 17 UNIQUE CHALLENGE Large volume of work incidents Variety of trade methods EXPECTED BENEFIT Model to predict what transactions will fail and time to resolve Create insights into trade partners based on good vs bad transactions and cost to NT and clients

18 Machine Learning to uncover patterns in STP failures by Investment Manager Provided a Grade )Purple: low resolution time, exception rate and complexity 1) Blue: high in hard complexity 2) Grey: high exception rate 3) Red: high resolution time

19 Dashboard Provide data on investment manager grade Manager Manager 1 Manager 2 Manager 3 Manager 4 Manager 5 ML Cluster Resolution time Own Demographics Top 5% Exception Rate Top 10% Hard complexity Resolution time Top 10% Top 25% Top 25% Top 10% Top 25% Top 5% Top 25% Lower 25% Top 25% Lower 25% Top 25% Top 25% Top 5% Lower 25% Top 5% Top 25% Top 5% Global Exception Rate Top 5% Top 10% Lower 25% Top 10% Hard complexity Top 25% Top 5% Top 25% Top 5% Lower 25% Lower 25% Lower 25% Looks within their own cluster Each manager is compared to the cluster center, then ranked Looks across all clusters Each manager is compared to the good cluster 25% percentile, then ranked

20 Dashboard Investment Manager detail view Own Demographics Global Manager ML Cluster Resolution time Exception Rate Hard complexity Resolution time Exception Rate Hard complexity Manager 1 Top 5% Top 10% Top 25% Top 5% Top 5% Top 25% Nr of Work items Nr of Trades Exception Rate Avg Resolution time % 1 day 30% Hard complexity

21 CASE STUDY Improve Financial Institution Operational Efficiency with Machine Learning USE CASE Predict what credit files received from clients will fail automated scanning Build business hierarchy relationships 21 UNIQUE CHALLENGE Current file process is different across all clients Current process is rules based and very inaccurate Current process requires a lot of manual intervention EXPECTED BENEFIT Expect new solution to adapt as new failures are observed Automate manual process with artificial intelligence

22 CASE STUDY Optimize Nedbank s ATM Experience with Machine Learning Even with a team of experienced data scientists on the ground, IBM was able to augment my team, provide strong technical leadership, and put in place a strong practice to set us up for quick delivery, but also enabling us for success in the future. Guy Taylor, Head of Data & Data-Driven Intelligence, Nedbank USE CASE Provide Nedbank s customers access to fully functional ATMs at all times Better predict cash-outs & machine failures Optimize service and cash replenishment schedules UNIQUE CHALLENGE Difficult to predict machine fault category Lengthy planning cycle due to uncertain travel times, custodian skills & availability EXPECTED BENEFIT Improve customer experience Reduce planning cycle Reduce replenishment & service costs 22

23 CASE STUDY Improve match of different versions of the same corporation for credit scoring w/ Deep Learning USE CASE Agency maintains corporate hierarchies and have a significant backlog of unmatched entities that are manually reviewed. IBM seeks to: Improve the current matching method using Deep Learning both in terms of speed and error rate Reduce the manual labor of maintaining their current hierarchies UNIQUE CHALLENGE Matching millions of entities with low error rate EXPECTED BENEFIT Improve matching speed and data quality of hierarchies Sub-setting the matching problem into manageable blocks for matching and retraining 23

24 Business Transformation with Data Science

25 Data Science Transformation Build a Data Science Strategy Articulate Use Case Break Down Data Silos Identify AI Technologies, Partner or Acquire tools to fill technology gaps. Integrate AI in your workflow, between business units Adopt an open and collaborative culture.

26 3 Key Strategies to build a Data Science Team.. 1 Skills - Build a balanced Data Science team Programmatic and visual data scientist Enable business domain experts 2 Organization Operationalize Use Cases Experimentation to deployment to increase business value Cross-train and increase team velocity Create Center of Excellence and empower lines of business 3 Tools - Connect data, algorithms and applications Access, prep and analyze data seamlessly All tools in one environment Open ecosystem 26

27 Skills - Data Science Domain Expertise Domain Knowledge Supply Chain CRM Financials Networking Engineering Research Scripting, SQL Python, R Scala Data Pipelines Big Data/ Apache Spark Computer Science Unicorn Machine Learning Math & Stats Mathematics Computational Data Science Projects Require multiple Skills

28 Organizations - design for success in data Centralized Hub/Spoke Decentralized Corporate Chief Data Officer Data policy Compliance needs Corporate Use Cases Platform and aggregation Data scientists Transform and manage large data sets Solve real-world problems Evaluate and tune model performance Unit/Division CDOs Data steward Compliance execution Division Use Cases Drive maturity curve advances 28 28

29 Tools - Data Science Open Source Leading SQL (42%) R (33%) Python (26%) Excel (25%) Java Ruby C++ (17%) SPSS SAS (9%)

30 Data Science Tools Open Eco Systems BUILD DEPLOY RUN jupyter R zeppelin auto prep model build GUI Monitoring & Alerting Model retraining XGBoost KPI Dashboards DATA Model Refactoring Security object storage databases hadoop Engaging to WIN: Assets, Services & Marketplace Experts and Leaders: Sales, Tech Sales, Offering Management, Enablement & Partners

31 Platform Focus on your use cases, not technology Explore at scale Scale out on-demand Minimal Dev-ops/engineering setup Reproducibility Process of tracking Reproduce results easily Security Governed Access Administration capabilities Data Encryption, over wire Collaborate Understand what s been done Share and accelerate learning Governed Collaboration Publish Efforts Models as APIs out of the box Avoid Engineering re-work Discovery to Production Minimal dev ops Seamless scale to enable scoring Integration with product workflows Open Use desired tools of choice Interoperability across tools Community driven ecosystem Review Results Stakeholder review Via Dashboards/Static reports Monitoring QA/QC on-demand Retraining workflow Alerts when thresholds drop - Focus your team s strengths to solve your business and domain centric problems

32 Forrester Research Ranks IBM as a Leader in Multimodal Predictive Analytics and Machine Learning (PAML) IBM puts AI to work. IBM Watson is a vast umbrella of technologies and solutions, one of which is Watson Studio, a PAML solution. Watson Studio was designed from the ground up to aesthetically blend SPSS-inspired workflow capabilities with open source machine learning libraries and notebook-based interfaces. It is designed for all collaborators business stakeholders, data engineers, data scientists, and app developers who are key to making machine learning models surface into production applications. Watson Studio offers easy integrated access to IBM Cloud pretrained machine learning models such as Visual Recognition, Watson Natural Language Classifier, and many others. It is a perfectly balanced PAML solution for enterprise data science teams that want the productivity of visual tools and access to the latest open source via a notebook-based coding interface. Source: The Forrester Wave TM : Multimodal Predictive Analytics And Machine Learning Solutions, Q3 2018, Forrester Research, September 2018

33 IBM s Data Science Elite team co-engineers prototypes with you to succeed and lead in AI Who are we? A team of IBM Data Science experts, with skills in: Descriptive, predictive & prescriptive analytics Industry-specific use cases Machine learning, deep learning, decision optimization, data engineering, data journalism Validate with LoB Identify use case Break into 4 Sprints What do we offer? ü Up to 3 month no cost engagement ü Identify use case(s) & Minimal Viable Products via discovery & design workshops ü Collaboratively build & evaluate up to 4 Sprints (using IBM s Data Science Experience) Visualize before vs after dashboard IBM Data Science Elite Dedicated to client success Build models / pipelines ü Mentor & enable client teams hands-on Monitor / retrain Validate approach What do we ask of clients? Deploy via APIs Dedicated team members to match our headcount on the engagement

34 Getting Started - Discuss your AI Use Case Your Use Case Articulate Use Case Break Down Data Silos Identify AI Technologies, Partner or Acquire tools to fill technology gaps. Integrate AI in your workflow, between business units Adopt an open and collaborative culture.

35 Data Science Use Cases Next 30 Minutes Team Discussion 1) How will your organization be able to apply data science into your daily business? 2) What are the challenges with the existing data analytics effort you are engaged in today? 3) What is required for the organization to better take advantage of data science in the future? 4) Please share any other insights you got from the event 35

36 What is your Data Science Use Cases Next 25 Minutes Sharing your use cases 36

37 Thank you Carlo Appugliese Program Director Machine Learning and AI IBM Analytics, Data Science Elite Team 37

38 sy y B a y l d M g b Bh C i d a e l d a l cd d ha m a i l Mi a aa a CC g C i d d am a i C i d l d am a i a w o e o Cg C i a l M l d d p n l Mi o l Mi l d d a l cd d ha m t a w d l im g i l Mid ha m i e a ld d id C i B d a i Mi l d ha m a i Mi a i i a Mi l d d a l cd d ha m r B sl am Mi e r B i t t B i s l c am g id l g i Mid ha m im B g B l C a g B d M u B r B I i ham.. / / / / / / M B w a i ham... li 2018 IBM Corporation 38