Data Analytics for the T&E Enterprise

Size: px
Start display at page:

Download "Data Analytics for the T&E Enterprise"

Transcription

1 Data Analytics for the T&E Enterprise Ryan Norman Big Data and Knowledge Management Initiative Lead Test Resource Management Center

2 Worldwide Exponential Growth of Data 2 More information over the next two years than in the entire history of mankind! ,000 EB Name (Symbol) Value ,000 EB ,250 EB EB EXABYTES Source: IDC s Digital Universe Study, sponsored by EMC, Dec 2012

3 Test & Evaluation Growth of Data T&E Mission: Acquire data and discern into knowledge Increased System Complexity Separation video (1-8 Mbps) Integrated avionics (2-10 Mbps) Integrated weapon system (1-20 Mbps ea.) 3 Cockpit video (1-5 Mbps) Integrated selfdefense system (0.5-2 Mbps) Integrated communications (0.5-2 Mbps) Flight test transducers ( Mbps) Multi-spectral sensors (1-12 Mbps) Total Throughput: 7.5Mbps 70Mbps+ Larger Test Footprints 4-on-4 test flights (more systems per test) Much faster weapons systems Geographic separation not as effective as it used to be Demand for Shorter Acquisition Cycles More concurrent testing More real-time analysis Increased System-of-Systems Test Complexity Five Futures (EW, UAV, NCO/W, DE, Hypersonics) Integrated fleet (F-18E/F, E-18G, F-35, SM VI, UAV) Swarming UAVs

4 Need: An Evaluation Revolution Most T&E investments have been focused on the T rather than the E Our analysis & evaluation capabilities are not keeping up with the complexity and speed required by today s acquisition systems The next-generation of acquisition systems will be exponentially more complex than today Impact: T&E quality is inadequate for our needs More data is being collected than can be properly analyzed Only a tiny fraction of data is looked at Analysis occurs on a small fraction of data Focus is on a single test, rather than data collected across the system lifecycle No systematic anomaly detection, trend analysis, regression analysis, causality analysis, pattern recognition, simulation/test comparisons, perceived truth/ground truth comparisons are being done Impact: T&E timeliness is inadequate for our needs Analyst retrieval of test data in many cases takes days/weeks rather than seconds/minutes Sometimes it s easier (though not cheaper) to just re-run a test rather than find old data that may answer the question Long data ingest times prevent proper debriefing of test participants after a test is over, since their statements cannot be correlated with data in real time Impact: T&E dollars are being spent unnecessarily More tests than necessary are being done, sometimes at enormous expense Cross-program lessons learned only occur anecdotally A systematic approach to Big Data Analytics and Knowledge Management is required to address these three serious issues 4

5 Realizing Big Data and Improved DoD T&E Knowledge Management 1. Understand and Document T&E challenges & needs (FY12) Completed Data Management for Distributed Testing (DM-DT) Study Result: Developed functional requirements for T&E enterprise distributed Data Management (FY13) Comprehensive Review of T&E Infrastructure report published Key Recommendation: Use DoD cloud solution for T&E data Key Recommendation: USD(AT&L) establish a DoD-wide KM capability for T&E to help achieve better acquisition outcomes and reduce costs 2. Execute proofs of concept that inform an enterprise approach to T&E Knowledge Management (FY15-18) Joint Strike Fighter Knowledge Management (JSF-KM) project Goal: Assess KM technologies and methodologies in support of an existing acquisition program (FY15-17) Collected Operational Data Analytics for Continuous Test & Evaluation (CODAC-TE) project Goal: Apply KM technologies and methodologies across the lifecycle 3. Develop investment plan that achieves strategic objectives: Integrate T&E infrastructure into cohesive Knowledge Management enterprise Modernize T&E practices & processes to leverage Big Data analytics techniques Apply Big Data analytics tools & techniques to the T&E mission space UNCLASSIFIED DISTRIBUTION D

6 What do we need? Individual Range Range Augmentation Virtualized Big Data Tools Some Processing Some Tiered Storage MLS Security Enhance Ingest Current Range Infrastructure Existing Tools Existing Storage Existing Ingest Capabilities 4. Trained Data Science Workforce Quick-Look Working Files 2. Cloud Analytics Capability Regional Analytics Capability Data Cloud-Based Big Data Analytics and Knowledge Management System Regional Analytics Capability Virtualized Big Data Tools Processing Tiered Storage MLS Security Data Scientists Reports Application Repository Schedule Info Regional Analytics Capability Virtualized Big Data Tools Processing Tiered Storage MLS Security Data Scientists Virtualized Big Data Tools Processing Tiered Storage MLS Security Data Scientists Video Audio Imagery 3. Big Data Tools New Existing 1. Integrated Local Data Integrated Scalable Cost-Effective State-of-the-Art

7 7 Help Wanted: DoD Data Scientists Unique DoD data challenges require an interdisciplinary approach with skills & analytical techniques required from 3 broad areas: Subject Matter Expertise Ability to assess which models are feasible, desirable, and practical in different settings Clear ideal of the distinction between correlation and causality Statistics Especially Bayesian statistics with multivariate analysis Computer Science Traditional Software Data Science Machine Learning Big Data Analytics Subject Matter Expertise Knowledge of probability, distributions, hypothesis testing, and multivariate analysis Math and Statistics Traditional Research Computer Science Databases, SQL, data structures, algorithms, parallel computing, distributed computing, etc. Proliferation of sensors and large data sets are overwhelming analysts, as they lack the tools to efficiently process, store, analyze, and retrieve vast amounts of data ASD(R&E) Department of Defense Research & Engineering website

8 JSF T&E KM & Data Needs Addressed by TRMC 1. Data Capture: DART Pod is too large, requires significant jet modifications, and is not certified to support F-35 full operational profile 2. Data Warehousing: Flight test data should be stored in a government facility to expedite data access & discovery 3. Data Ingest: Current DART Pod test data ingest is too slow to meet multi-ship quick-look and quick-turn requirements examples: 2 on 2; 4 turn 2; 4 on 4 turn 4 on 4 4. Data Access: Test data should be available for quick-look analysis during mission debrief to inform decision making 5. Video: DART Pod video should be available for quick-look analysis during mission debrief to inform decision making 6. Big Data Analytics: Analysis capabilities need to proactively identify unknown unknowns and other anomalies impossible for a human to discern 7. Remote Operations: Analysts need a rapid reaction capability to harvest data and conduct quick-look analyses in situations / locations where a network connection is not possible UNCLASSIFIED DISTRIBUTION D 8

9 TRMC Investments Support Joint Strike Fighter Needs Miniaturized Data Capture On-Board Instrumentation Next-Generation Threats EW Cyber T&E On Board JSF Data Ingest & Validation QRIP MILS Network CRIIS Realistic Distributed Mission Environments EWIIP LVC Integration NCR Model Validation & Improvement RAPIDS JMETC TENA Big Data Analytics & Evaluation Cross Domain Solutions MLS-JCNE Interoperable With JSE Analysts, Evaluators, & Decision-Makers JSF-KM KM Data Archive UNCLASSIFIED DISTRIBUTION D 9

10 JSF-KM Improvements to Existing T&E Capabilities DT Today OT Today Data Ingest 2 hours (per aircraft) Raw Data Available Data Ready for LM 1 day 1 week Govt. Analyst Data Request Analysis With JSF-KM Data Ingest Raw Data Available Data Ready for (Govt) 3 weeks of data 1-2 hours available online (per aircraft) 10 minutes 4-5 hours Parallel Data Ingest 30 minutes (multiple aircraft) 30 seconds Raw Data Available 30 seconds Video/Data at Post- Mission Debrief Data Ready for (Govt) >20 weeks of data available online 90 minutes Govt. Analyst Data Request Big Data Analytics Govt. Analyst Data Request Analysis Analysis UNCLASSIFIED DISTRIBUTION D Note: Numbers reflect single 2 hour flight mission 10

11 JSF-KM Progress to Date Measure Current Expected Achieved Online Data Capacity 3-4 weeks of data (Test missions online) 20 weeks of data (Test missions online) TBD Potential Online Data Capacity ~105 flight hours* of OT data online ~5670 flight hours* of OT data online TBD Local test data access times 2 hours for 2 hour mission < 1 hour for 2 hour mission 30 minutes Remote data access times 4 hours from ingest complete 1 minute from ingest complete TBD Engine Anomalies Unknown Unknowns Discovery N/A Engine Anomalies Time Correlation Avionics Box Issue Erratic Ground Sensor Data Profiling Speed Cost savings / avoidances N/A $3M in Savings $4M in Avoidances $10M+ in Savings / $4M+ in Avoidances * Assumes 56GB per hour for OT flight 11

12 JSF-KM FY15 Success Stories Resolved test data time correlation issues Time stamps in data files found to be corrupted post-mission JSF-KM analysis tools were able to correct the time correlation issue Without JSF-KM, at least five missions would have been re-flown Video available during post-mission debrief due to JSF-KM data ingest improvements from DART Pod Existing tools could not process video in time to support post-mission de-brief Without JSF-KM, there would be no flight video during post-mission debrief Discovery of avionics box issue during first night mission Pilot and Analyst discovered problem from video data available 30 minutes after landing Avionics Box was replaced before another mission was flown Without JSF-KM, problem would not have been discovered for several days Return on Investment has been realized before deploying any Big Data analytics capabilities UNCLASSIFIED DISTRIBUTION D

13 JSF-KM Phase 1+ (FY16-17) Big data analytics system tested in lab using unclassified Propulsion data from Patuxent River KM team used data to stress test the infrastructure and fine tune the KM system Initial success led to analysts sending more propulsion data to the lab for analysis After obtaining rapid analytical insight analysts requested TRMC deploy a local KM system at Patuxent River Tentative deployment June 2017 Patuxent River leadership already identifying other airframes which could use the system Identified flights which experienced propulsion component failure During a blind analysis of 1,392 flights of propulsion data, JSF-KM data scientist was able to identify 7 of 10 flights with JSF analyst known engine issues Led to creation of a predictive model* for identifying future failures (*model validation pending) Without JSF-KM this predictive model may not have been generated UNCLASSIFIED DISTRIBUTION D

14 JSF-KM FY16 Success Stories Reduced data profile time from 5+ hours to 47 seconds per Query Big Data tool enabled massive improvement to data profile generation Without JSF-KM it would still take 5+ hours to perform data profile data runs Identified 2 engines that consistently performed differently than the other 9 engines within the 92 data sets analyzed Discovered a faulty/noisy ground sensor and an unknown pattern within a known sampling rate abnormality Without JSF-KM these anomalies may have never been discovered Identified issue with ground sensor Found anomalous points and pattern within inconsistent sensor data sampling rates Without JSF-KM these anomalies may have never been discovered Identified flights which experienced propulsion component failure During a blind analysis of 1,392 flights of propulsion data, JSF-KM data scientist was able to identify 7 of 10 flights with JSF analyst known engine issues Led to creation of a predictive model* for identifying future failures (*model validation pending) Without JSF-KM this predictive model may not have been generated UNCLASSIFIED DISTRIBUTION D

15 JSF-KM Phase 2 (FY17-18) Transition & Operational Support Delayed fielding has resulted in transition activities moving to Phase 2 Re-alignment of requirements and use cases to support enterprise vision Scalability and Technology Enhancements Current file store system inadequate for extremely large-scale jobs Performance improvements needed for large queries Better visualization and mining tools needed to aid analysis Quick Look Reports needed to better aggregate data sources Data Science support and Analytics Use Case Implementation Mission Systems and Propulsion Systems need continued support during transition Performance Health Monitoring interested in leveraging JSF-KM Pax interested in extending support for propulsion data to other airframes Investigate MILS Support MILS required to reduce JSF costs and maximize analysis capabilities Future need to connect with JSF Partner programs UNCLASSIFIED DISTRIBUTION D

16 Collected Operational Data Analytics for Continuous Test & Evaluation (CODAC-TE) Proof of Concept Background: US Army Aberdeen Test Center (ATC) has ~30TB of underutilized data including 20TB of in-theater operational data Goal: Utilize Big Data analytics across multi-commodity DT, OT, and in-theater system data to discover unknown unknowns for current and future Army systems Use Cases: Mine-Resistant Ambush Protected (MRAP) Theater Data; Camouflage Effectiveness Leverages High-Performance Computing Major Shared Resource Center and ATC expertise Challenges being addressed: Insufficient data science expertise Current analytical systems inadequate for today s data volume and velocity Lacking tools and techniques for discovering unknown unknowns and conducting complex trends analyses UNCLASSIFIED DISTRIBUTION D

17 Big Data Initiative Summary TRMC is acting upon recommendations from the Comprehensive Review of T&E Infrastructure. Strategic Goals: Integrate T&E infrastructure into cohesive Knowledge Management enterprise Modernize T&E practices & processes to leverage Big Data analytics techniques Apply Big Data analytics tools & techniques to the T&E mission space TRMC-funded proofs of concept are delivering proven capabilities Enabling Big Data analytics for JSF T&E Improving transfer of knowledge between fielded and next-gen systems Informing an investment roadmap that advises future infrastructure, process, and workforce decision-making Big Data Reference Architecture will ensure interoperability and efficiencies in next-generation range knowledge management TRMC will continue to evolve reference architecture based on feedback TRMC will identify community standardization mechanisms for core big data / knowledge management components TRMC will consider additional pilots that continue to expand big data analytics in acquisition 17