Data Cleansing - From Spreadsheets to Data Centres
|
|
- Elmer Young
- 5 years ago
- Views:
Transcription
1 Data Cleansing - From Spreadsheets to Data Centres
2 Intela Business Activity Summary Professional Services Machine learning & AI solutions AI strategy advice Products - Intelligent Data Cleansing - PhD & Masters Data Scientists - Only dedicated M/L AI firm - Research & IP development - Co-founders of City.AI - Machine vision analytics - Unstructured data intelligence AI Advisory - Data science and AI education - AI readiness analysis - Business deep dive analysis - Use case discovery - Data strategy - Business case analysis Proof of Concept - Collate & understand requirements - Data analysis & acquisition - Algorithmic design - Produce API/Dashboards (MVP) - Testing and refinement - Training and improvement Production Scalability - Testing on large data sets - Architecture performance - Security and resilience Algorithm license Annual license based on combination of volume, use, model iterations and maintenance requirements. We exist because there is a global shortage of data science, the backbone of artificial intelligence.
3
4 aiforum.org.nz
5
6 AI = Electricity We believe artificial intelligence will revolutionise business as electricity did for industry. Intela are AI power generators.
7 Data Science = framework + tools + context/value Machine Learning = a tool of data science Artificial Intelligence = an output of machine learning
8 Where should AI be considered?
9 AI & Data Data Volume - Historical records - Multiple databases - Analytics (web, app) - IoT
10 AI & Data Data Volume - Historical records - Multiple databases - Analytics (web, app) - IoT Data Complexity - Individual algorithms per user/customers - Multiple data sources and topics - Dynamic real-time
11 AI & Data Data Volume - Historical records - Multiple databases - Analytics (web, app) - IoT Data Complexity - Individual algorithms per user/customers - Multiple data sources and topics - Dynamic real-time Data Velocity - Transactions - Devices & sensors - # of users
12 AI & Data Data Volume - Historical records - Multiple databases - Analytics (web, app) - IoT Data Complexity - Individual algorithms per user/customers - Multiple data sources and topics - Dynamic real-time Data Velocity - Transactions - Devices & sensors - # of users Task Scale - Hours of video - TB of images - Text processing/search
13 How and where to start?
14 How and where to start Intela AI E 4 Strategy Framework YOUR ORGANISATION'S OPTIMAL AI ADOPTION STRATEGY YOU ARE HERE? NEED TO BE HERE? WANT TO BE HERE? AWARE INITIATED EMBRACED ENABLED EVERYWHERE EVERYTHING
15 How and where to start Recommended Phase 1: AI Readiness Analysis POWERFUL INSIGHTS & OUTPUTS FOR EVERY LEVEL Phase 2: Business Analysis RESOURCES TECHNOLOGY PROCESSES Phase 3: Business Case Development CULTURE PEOPLE CHANGE Phase 4: Implementation Planning
16 POV/C How to collect as much as possible DATA What is available? Data telemetry OBJECTIVE USER How get useful output during time horizon? How update model to reflect reality? SERVE How to scale the model to be production ready? MODEL
17
18 Data quality - the level of compliance of a data set with a contextual normality
19 Problem 1. Duplicate 2. Incomplete 3. Incorrect 4. Inaccurate 5. Irrelevant
20 Problem 1. Duplicate 2. Incomplete 3. Incorrect 4. Inaccurate 5. Irrelevant Resolution 1. Removing 2. Correcting 3. Harmonising 4. Standardising 5. Enhancing
21 Impact of poor data quality 20-80% BA time spent cleaning data
22 Impact of poor data quality 20-80% BA time spent cleaning data Reduced confidence/pain for analytics
23 Impact of poor data quality 20-80% BA time spent cleaning data Reduced confidence/pain for analytics Multiple versions cleaned by users
24 Impact of poor data quality 20-80% BA time spent cleaning data Reduced confidence/pain for analytics Multiple versions cleaned by users Delays transformation projects
25 Impact of poor data quality 20-80% BA time spent cleaning data Reduced confidence/pain for analytics Multiple versions cleaned by users Delays transformation projects Crushes ML/AI - bad data in, bad data out
26 2017 MACHINE LEARNING ALGORITHMS IDENTIFY DUPLICATES ORGANISE DUPLICATES CLEAN DATA IDENTIFY DATA JOINS SINGLE VIEW OF CUSTOMER
27 2018 CSV CSV API S3 STREAM Public Cloud Private Cloud On-Prem On Device DATA INGEST IDENTIFY QUALITY ISSUES Duplicate Incomplete Incorrect Inaccurate Irrelevant ACTIVE LEARNING HUMAN-IN- THE-LOOP Domain training CLEAN DATA API IDENTIFY DATA JOINS S3 STREAM
28 AI Data Quality
29 AI Data Quality
30 Pre-Trained Domain Expertise Addresses CRM Transactions Sports Tickets Patient Data Pharmaceuticals
31 What we have learnt Your Challenges Multiple Truths Multiple Systems Multiple Consumers No Standards Limited Resources Growing Expectations
32 What we have learnt Our Challenges Data access Confidence in AI Explainability Scale of cost assumption Program of work integration Procurement
33 Impact of poor data quality 20-80% BA time spent cleaning data Reduced confidence/pain for analytics Multiple versions cleaned by users Delays transformation projects Crushes ML/AI - bad data in, bad data out
34 Opportunities with clean/smart data Analysts spend more time on impactful insights
35 Opportunities with clean/smart data Analysts spend more time on impactful insights Easier reporting across stakeholders
36 Opportunities with clean/smart data Analysts spend more time on impactful insights Easier reporting across stakeholders Reduced cost of digital transformation
37 Opportunities with clean/smart data Analysts spend more time on impactful insights Easier reporting across stakeholders Reduced cost of digital transformation Opens up potential for ML/AI
38 Opportunities with clean/smart data Analysts spend more time on impactful insights Easier reporting across stakeholders Reduced cost of digital transformation Opens up potential for ML/AI Real-time recommendations & actions
39 What value could you create if data quality was no longer an obstacle?
40 THANK YOU Data Cleansing - From Spreadsheets to Data Centres