ANALYTIC SOLUTIONS WITH DISPARATE DATA

Size: px
Start display at page:

Download "ANALYTIC SOLUTIONS WITH DISPARATE DATA"

Transcription

1 ANALYTIC SOLUTIONS WITH DISPARATE DATA CQSDI 2018, Cape Canaveral, FL John Schroeder and Chad Hall Lockheed Martin Aeronautics Enterprise Integration Advanced Analytics

2 We see disparate data sources as a business process problem. The analytics process defines how the data needs to come together.

3 OVERVIEW Managing Disparate Data Organizational Influences Early Identification Analytics Phases of Development Architecture Considerations Tool Considerations Human Resources example Quality Analytics Solution example

4 ORGANIZATIONAL INFLUENCES Enterprise Integration Advanced Analytics Process Excellence Managing disparate data is as much about the organization and its processes than anything else

5 ADVANCED ANALYTICS CAPABILITIES A N ML V D Capability RPA & Cognitive Automation Natural Language Processing (NLP) Machine Learning Advanced Visualization Digitization Description Computer-coded, rules based software that automates manual activities by performing repetitive rules-based tasks; can be interspersed with human checkpoints at key milestones or for exception management An application of machine learning grounded in statistical inference that enables computer interpretation of various forms of human language (text, images, or speech) Advanced analytics characterized by their ability to continuously learn from training data rather than relying on a static ruleset; these algorithms detect patterns in data and adjust program actions accordingly, whether supervised or unsupervised The ability to synthesize layers of complex datasets to create multidimensional visual representations of decision-support tools, outputs, dashboards, KPIs, and reports; advances include the transition to multi-platform, mobile-enabled Engagement of sensors and other digital observation technology (e.g. RFID) to convert non-electrical inputs/events into digital information for analysis and decision making

6 EARLY UNDERSTANDING OF DISPARATE DATA Identification Prioritization and Selection Product Loop Planning & Development BUSINESS PROBLEM Enhance & Maintain Test & Train Early identification of disparate data necessary for project prioritization and planning

7 ANALYTICS PHASES OF DEVELOPMENT Hypothesis Prototype Industrialize O&M Solution Completeness Confirm business value? Can we scale the solution? Build & rollout application Maintain & Enhance Disparate Data Considerations Speed Resources Cadence Architecture Requirements Testing Scalability Troubleshooting Maintenance Resources

8 Virtualization Temporary Storage ARCHITECTURE CONSIDERATIONS Acquire Organize Analyze Deliver Data Sources Streaming / In Motion IOT Staging / At Rest Operational Systems External Data Integration Transform Aggregate Connections Logical Data Warehouse Traditional Database In-Memory Columnar Services Big Data Distributed Processing Self-Service Data Preparation Analytic Capabilities Analyze Optimize Forecast Report Plan Discover Collaborate Predict Model Business Objects Mobile Display Analytic Dashboard Visuals Advanced Analytics Delivered Reporting Self Service & Data Science Data Governance Opportunity to blend disparate data across entire architecture

9 Virtualization Temporary Storage TOOL CONSIDERATIONS Acquire Organize Analyze Deliver Data Sources Streaming / In Motion IOT Staging / At Rest Operational Systems External Data Integration Transform Aggregate Connections Logical Data Warehouse Traditional Database In-Memory Columnar Services Big Data Distributed Processing Self-Service Data Preparation Analytic Capabilities Analyze Optimize Forecast Report Plan Discover Collaborate Predict Model Business Objects Mobile Display Analytic Dashboard Visuals Advanced Analytics Delivered Reporting Self Service & Data Science Data Governance

10 DISPARATE DATA CONSIDERATIONS: A HUMAN RESOURCES EXAMPLE

11 HUMAN RESOURCES BUSINESS PROBLEM Where is the supply of talent and will it meet our needs? What is the hiring lead time required to source and train talent? Do forecasts adequately account for production floor volatility and risk? Do people safety stocks have adequate buffer to account for time and forecast based volatility? Are resources available to train new hires?

12 THE PROBLEM LANDSCAPE The Process Environmental Scan Talent Pipeline & Lead Lime Talent Need People Safety Stock L&D Teams Talent Acquisition Workforce Planning Learning & Development 3 rd party data Pipeline Ratios Staffing Plans Database Data Database Advanced Model Training Capacities Not surprisingly, each team s data doesn t talk to one another

13 CONSIDERATIONS What is the right course when: The data is in many different places, and in different formats The objective is clear, but the solve isn t We can t wait for a fully baked IT solution

14 EXAMPLE SOLUTION SET Data Ingestion, Cleaning, Blending Modeling Analytics Visualize Insights ETL functions in the power of the analytics professional Connects and combines disparate data sources Integration with R Studio, Python, and Statistical Analysis Advanced statistical analysis Forecasting Visualize analytics Publish and share on Tableau server

15 DISPARATE DATA CONSIDERATIONS: A QUALITY ANALYTICS SOLUTION EXAMPLE

16 QUANTUM (Quality Analytics Text Unstructured Mining) QUANTUM uses natural language processing and machine learning to analyze a population of non-conformance text documents to connect the dots quickly and accurately to other related non-conformances

17 QUALITY ENGINEERING PROCESS Quality Engineering Process Quality Engineering Process Potential Issue Identification Issue Investigation Launch Root Cause Analysis Issue Investigation Close Decision Corrective Action Decision Corrective Action Execution Review repetitive non-conformance categorization Review repetitive part numbers Pull documents Read Process Assess Identification of defect causality Pursue leads via disparate data sources Generate insights, think more broadly New non-conformances continually adjust Connect the dots

18 CONNECTED DISPARATE DATA Business Objectives Connected Disparate Information Sources Data Products Apply Artificial Intelligence to disparate information sources to align engineering support with the most significant business impact Provide comprehensive analytics solution set, enabling engineering to go directly to the problem solving process Quality ENGR Request Engineering ENGR Change Request Logs Change DOCs QAR Logs Visualize results from engineering change activity with operations performance outcomes Sustainment Field Logs Deliver Return-On-Investment guidance Corrective Action Correction DOCs Operations Performance Metrics

19 NATURAL LANGUAGE PROCESSING A computer cannot understand text, but it can simulate understanding. To do so it needs to understand the rules of natural language. Words Part of Speech Tagging Grammar Meaning of Words in Context John likes to watch movies. Mary likes movies too. John also likes to watch football games. Noun (NN) Pronoun Proper Noun (NNP) Adjective Verb (V) Adverb Preposition Interjection Conjunction Text: John likes to watch movies. Syntactic Analysis: John / NNP likes / V to watch / V movies / NN. Semantic Analysis: John / Person likes to watch movies / Thing. Pragmatic Analysis: Social Conversation Meaning of Whole is Built from its Parts

20 MACHINE LEARNING Determine whether a home is in San Francisco or New York Uses features (e.g. elevation) to categorize data Elevation Adding features for further distinction Find relationships between each pair of dimensions Machine learning methods use statistical learning to identify patterns Year Built Bathrooms Clustering - groups based on inherent features S.F. Bedrooms Elevation Price Price / SQ FT N.Y. Square Feet The computer learns as more data is provided Price / sq. ft. Reference:

21 QUANTUM SOLUTION SET Data Ingestion, Cleaning, Blending Modeling Analytics Visualize Insights Scalable ETL in production environment Connects and combines disparate data sources Integration with R R: Advanced clustering algorithms HANA: text libraries Visualize analytics Publish and share on Tableau server

22 BUSINESS RAMIFICATIONS Do More Corrective Action with Less Time Align Resources to Effect Greater Business Costs Improved Customer Satisfaction

23