Data Cleansing - From Spreadsheets to Data Centres

Size: px
Start display at page:

Download "Data Cleansing - From Spreadsheets to Data Centres"

Transcription

1 Data Cleansing - From Spreadsheets to Data Centres

2 Intela Business Activity Summary Professional Services Machine learning & AI solutions AI strategy advice Products - Intelligent Data Cleansing - PhD & Masters Data Scientists - Only dedicated M/L AI firm - Research & IP development - Co-founders of City.AI - Machine vision analytics - Unstructured data intelligence AI Advisory - Data science and AI education - AI readiness analysis - Business deep dive analysis - Use case discovery - Data strategy - Business case analysis Proof of Concept - Collate & understand requirements - Data analysis & acquisition - Algorithmic design - Produce API/Dashboards (MVP) - Testing and refinement - Training and improvement Production Scalability - Testing on large data sets - Architecture performance - Security and resilience Algorithm license Annual license based on combination of volume, use, model iterations and maintenance requirements. We exist because there is a global shortage of data science, the backbone of artificial intelligence.

3

4 aiforum.org.nz

5

6 AI = Electricity We believe artificial intelligence will revolutionise business as electricity did for industry. Intela are AI power generators.

7 Data Science = framework + tools + context/value Machine Learning = a tool of data science Artificial Intelligence = an output of machine learning

8 Where should AI be considered?

9 AI & Data Data Volume - Historical records - Multiple databases - Analytics (web, app) - IoT

10 AI & Data Data Volume - Historical records - Multiple databases - Analytics (web, app) - IoT Data Complexity - Individual algorithms per user/customers - Multiple data sources and topics - Dynamic real-time

11 AI & Data Data Volume - Historical records - Multiple databases - Analytics (web, app) - IoT Data Complexity - Individual algorithms per user/customers - Multiple data sources and topics - Dynamic real-time Data Velocity - Transactions - Devices & sensors - # of users

12 AI & Data Data Volume - Historical records - Multiple databases - Analytics (web, app) - IoT Data Complexity - Individual algorithms per user/customers - Multiple data sources and topics - Dynamic real-time Data Velocity - Transactions - Devices & sensors - # of users Task Scale - Hours of video - TB of images - Text processing/search

13 How and where to start?

14 How and where to start Intela AI E 4 Strategy Framework YOUR ORGANISATION'S OPTIMAL AI ADOPTION STRATEGY YOU ARE HERE? NEED TO BE HERE? WANT TO BE HERE? AWARE INITIATED EMBRACED ENABLED EVERYWHERE EVERYTHING

15 How and where to start Recommended Phase 1: AI Readiness Analysis POWERFUL INSIGHTS & OUTPUTS FOR EVERY LEVEL Phase 2: Business Analysis RESOURCES TECHNOLOGY PROCESSES Phase 3: Business Case Development CULTURE PEOPLE CHANGE Phase 4: Implementation Planning

16 POV/C How to collect as much as possible DATA What is available? Data telemetry OBJECTIVE USER How get useful output during time horizon? How update model to reflect reality? SERVE How to scale the model to be production ready? MODEL

17

18 Data quality - the level of compliance of a data set with a contextual normality

19 Problem 1. Duplicate 2. Incomplete 3. Incorrect 4. Inaccurate 5. Irrelevant

20 Problem 1. Duplicate 2. Incomplete 3. Incorrect 4. Inaccurate 5. Irrelevant Resolution 1. Removing 2. Correcting 3. Harmonising 4. Standardising 5. Enhancing

21 Impact of poor data quality 20-80% BA time spent cleaning data

22 Impact of poor data quality 20-80% BA time spent cleaning data Reduced confidence/pain for analytics

23 Impact of poor data quality 20-80% BA time spent cleaning data Reduced confidence/pain for analytics Multiple versions cleaned by users

24 Impact of poor data quality 20-80% BA time spent cleaning data Reduced confidence/pain for analytics Multiple versions cleaned by users Delays transformation projects

25 Impact of poor data quality 20-80% BA time spent cleaning data Reduced confidence/pain for analytics Multiple versions cleaned by users Delays transformation projects Crushes ML/AI - bad data in, bad data out

26 2017 MACHINE LEARNING ALGORITHMS IDENTIFY DUPLICATES ORGANISE DUPLICATES CLEAN DATA IDENTIFY DATA JOINS SINGLE VIEW OF CUSTOMER

27 2018 CSV CSV API S3 STREAM Public Cloud Private Cloud On-Prem On Device DATA INGEST IDENTIFY QUALITY ISSUES Duplicate Incomplete Incorrect Inaccurate Irrelevant ACTIVE LEARNING HUMAN-IN- THE-LOOP Domain training CLEAN DATA API IDENTIFY DATA JOINS S3 STREAM

28 AI Data Quality

29 AI Data Quality

30 Pre-Trained Domain Expertise Addresses CRM Transactions Sports Tickets Patient Data Pharmaceuticals

31 What we have learnt Your Challenges Multiple Truths Multiple Systems Multiple Consumers No Standards Limited Resources Growing Expectations

32 What we have learnt Our Challenges Data access Confidence in AI Explainability Scale of cost assumption Program of work integration Procurement

33 Impact of poor data quality 20-80% BA time spent cleaning data Reduced confidence/pain for analytics Multiple versions cleaned by users Delays transformation projects Crushes ML/AI - bad data in, bad data out

34 Opportunities with clean/smart data Analysts spend more time on impactful insights

35 Opportunities with clean/smart data Analysts spend more time on impactful insights Easier reporting across stakeholders

36 Opportunities with clean/smart data Analysts spend more time on impactful insights Easier reporting across stakeholders Reduced cost of digital transformation

37 Opportunities with clean/smart data Analysts spend more time on impactful insights Easier reporting across stakeholders Reduced cost of digital transformation Opens up potential for ML/AI

38 Opportunities with clean/smart data Analysts spend more time on impactful insights Easier reporting across stakeholders Reduced cost of digital transformation Opens up potential for ML/AI Real-time recommendations & actions

39 What value could you create if data quality was no longer an obstacle?

40 THANK YOU Data Cleansing - From Spreadsheets to Data Centres