The Applicability of HPC for Cyber Situational Awareness

Similar documents
Engineered Resilient Systems (ERS) Architecture NDIA Conference, November 2016

Machine Learning For Enterprise: Beyond Open Source. April Jean-François Puget

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

Applying Automated Methods of Managing Test and Evaluation Processes

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

IoT ANALYTICS IN THE ENTERPRISE WITH FUNL

HPCMP New Users Guide Who Are We? Distribution A: Approved for Public release; distribution is unlimited.

Transforming Analytics with Cloudera Data Science WorkBench

White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure


Adobe and Hadoop Integration

Creating an Enterprise-class Hadoop Platform Joey Jablonski Practice Director, Analytic Services DataDirect Networks, Inc. (DDN)

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden

Analytics in Action transforming the way we use and consume information

Cloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved.

Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

New Big Data Solutions and Opportunities for DB Workloads

Real-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale

20775A: Performing Data Engineering on Microsoft HD Insight

Adobe and Hadoop Integration

Analytics for the NFV World with PNDA.io

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform

20775 Performing Data Engineering on Microsoft HD Insight

Building a Data Lake on AWS EBOOK: BUILDING A DATA LAKE ON AWS 1

Building data-driven applications with SAP Data Hub and Amazon Web Services

GPU ACCELERATED BIG DATA ARCHITECTURE

Advancing Information Management and Analysis with Entity Resolution. Whitepaper ADVANCING INFORMATION MANAGEMENT AND ANALYSIS WITH ENTITY RESOLUTION

20775: Performing Data Engineering on Microsoft HD Insight

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Databricks Cloud. A Primer

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

Spark and Hadoop Perfect Together

Data Analytics for the T&E Enterprise

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

TDWI Analytics Fundamentals. Course Outline. Module One: Concepts of Analytics

Hortonworks Connected Data Platforms

20775A: Performing Data Engineering on Microsoft HD Insight

Connected Vehicles Accelerator

Test Resource Management Center Big Data Analytics / Knowledge Management Architecture Framework Overview

Modern Analytics Architecture

SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform

Who is Databricks? Today, hundreds of organizations around the world use Databricks to build and power their production Spark applications.

<Insert Picture Here> Oracle Exalogic Elastic Cloud: Revolutionizing the Datacenter

Architecture Overview for Data Analytics Deployments

IBM Analytics Unleash the power of data with Apache Spark

Future City Challenge Coaching & more

Microsoft Azure Essentials

Engineered Resilient Systems (ERS) Architecture

Lesson 3 Cloud Platform as a Service usages for accelerated Design and Deployment of IoTs

Applicazioni Cloud native

The Malicious Insider: Identifying Anomalies in High Dimensional Data to Combat Insider Threat

Predictive Analytics Reimagined for the Digital Enterprise

Why an Open Architecture Is Vital to Security Operations

Oracle Enterprise Data Quality Product Roadmap and Statement of Direction. October 2016

Microsoft reinvents sales processing and financial reporting with Azure

Bluemix Overview. Last Updated: October 10th, 2017

Pre-Requisites A good understanding of Azure data services A basic knowledge of the Microsoft Windows operating system and its core functionality

IBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse

Deep Learning Acceleration with

REDEFINE BIG DATA. Zvi Brunner CTO. Copyright 2015 EMC Corporation. All rights reserved.

High Performance Computing Portal Lowering Barriers / Modern HPC Ecosystems

Cloud Practice Overview August

Our Emerging Offerings Differentiators In-focus

C3 Products + Services Overview

Automated Service Builder

Real-time Streaming Insight & Time Series Data Analytic For Smart Retail

SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform

Analysis and machine learning on logs of the monitoring infrastructure

How to develop Data Scientist Super Powers! Using Azure from R to scale and persist analytic workloads.. Simon Field

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Hadoop and Analytics at CERN IT CERN IT-DB

Enterprise-Scale MATLAB Applications

: 20776A: Performing Big Data Engineering on Microsoft Cloud Services

Large US Bank Boosts Insider Threat Detection by 5X with StreamAnalytix

SAP Cloud Platform Big Data Services EXTERNAL. SAP Cloud Platform Big Data Services From Data to Insight

Your Top 5 Reasons Why You Should Choose SAP Data Hub INTERNAL

DataAdapt Active Insight

Enterprise APM version 4.2 FAQ

Uncovering the Hidden Truth In Log Data with vcenter Insight

Architecting an Open Data Lake for the Enterprise

Infosys Real Time Streams

Establishing Self-Driving Infrastructure Operations

Modernizing Cyber Defense: Embracing CDM. Okta Inc. 301 Brannan Street, Suite 300 San Francisco, CA

Managing Data in Motion with the Connected Data Architecture

Statistics & Optimization with Big Data

In-Memory Analytics: Get Faster, Better Insights from Big Data

Why Machine Learning for Enterprise IT Operations

CHALLENGES OF EXASCALE COMPUTING, REDUX

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

From Isolation to Insights

Ed Turkel HPC Strategist

COMP9321 Web Application Engineering

Service Virtualization

Stateful Services on DC/OS. Santa Clara, California April 23th 25th, 2018

RAPIDS GPU POWERED MACHINE LEARNING

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

Transcription:

The Applicability of HPC for Cyber Situational Awareness Leslie C. Leonard, PhD August 17, 2017

Outline HPCMP Overview Cyber Situational Awareness (SA) Initiative Cyber SA Research Challenges Advanced Framework HACSAW Data repository Call for Proposals FY18 Next Steps Questions Page-2

HPCMP Ecosystem Users Results Acquisition Engineering A technology-led, innovation-focused program committed to extending HPC to address the DoD s most significant challenges DoD Supercomputing Resource Centers (DSRCs) Networking and Security Software Applications Test and Evaluation U.S. Air Force Research Laboratory DSRC Defense Research & Engineering Network (DREN) Core Software U.S. Army Research Laboratory DSRC Computational Environments U.S. Army Engineer Research and Development Center DSRC Computer Network Defense, Security R&D, and Security Integration Science and Technology Maui High Performance Computing Center DSRC Education and Training US Navy DSRC HPC User Support Page-3

HPCMP Supercomputing Resources Delivering HPC to the User Navy NPS Air Force AFRL ARL AFRL Air Force A9 MHPCC Navy NSWCCD Army ARL-MD AFRL/SD (2) ERDC Navy NRL Navy NRL Navy Penn State NAVY Air Force AEDC DTRA (3) Army ATEC WSMR(2) DoD Supercomputing Resource Centers (DSRC) Affiliated Resource Centers (ARC) Dedicated HPC Project Investments (DHPI) FY13/14/15/16 US Air Force Research Lab (AFRL) US Army Research Lab (ARL) Maui HPC Center (MHPCC) NAVY (Stennis Space Center) US Army Engineer Research and Development Center (ERDC) US Air Force Research Lab, Information Directorate, AFRL-RI Naval Research Lab (NRL) US Army, Space and Missile Defense Command SMDC US Navy, SSC-SD DTRA (2016, 2015, 2013) US Air Force, A9 (2016) US Navy, NRL (2014) US Air Force, AEDC (2013) US Army, ARL (2015) US Navy, NSWCCD (2013) US Air Force, AFRL (2014) US ARMY, ATEC (2017, 2014, 2013) US Navy, Penn Sate (2013, 2017) US Air Force, AFRL SD (2013, 2013) US Navy, NPS (2015) 11 of 17 DHPIs (FY13-17) operate at Above Secret Page-4

Cyber Situational Awareness Initiative ESG-tasked initiative to, examine the applicability of HPC to cyber situational awareness (SA) HPCMP is well-positioned to leverage HPC systems to address complex cybersecurity problems World-class computational resources leveraged by the RDT&E community Leading-edge software applications for computational analysis capabilities National research and engineering network DREN/SDREN Multi-disciplinary, multi-year project leveraging expertise from HPCMP (e.g., Security, Networking, Centers, Software Applications) and external collaborators Page-5

Research Challenges Volume and complexity of cyber threats Require real-time, automated responses Multi-dimensional threats Needle in a haystack dilemma Hadoop and Spark vs. HPC Data Underutilized data sources Data quality Security analytics deployment System requires extensive configuration and/or tuning before it is usable Page-6

Stages of Analytics Value What happened? Descriptive Analytics Why did it happen? Diagnostic Analytics What will happen? Predictive Analytics What response is required? Prescriptive Analytics Difficulty Page-7

Cyber SA Goals and Objectives Explore current HPC and cyber SA intersections Understand how data analysis software performs on HPC Conduct a formal analysis of cyber SA data sources Assemble raw cyber data streams and associated ontology Perform an Open Source Software (OSS) compliance review Review all OSS components in order to validate compliance with requirements Establish a data repository and HPC processing pipeline Ingest, parse, and store DREN data feeds Establish & maintain a workflow model for HPC analytics Support cyber SA and HPC analytic collaborators Collaborators perform data analytics studies against static datasets Benchmarks are performed to compare HPC solutions with traditional non-hpc Solutions Collaborators report and document findings Page-8

Advanced Framework for Cyber SA Purpose: To serve as the HPCMP cyber SA framework and facilitate the development and transition of data driven analytics Project Name: HPC Architecture for Cyber Situational Awareness (HACSAW) Proving ground for novel ideas, algorithms, and approaches suitable for large scale execution in a dedicated HPC environment Reduce barriers to real world enterprise cyber data and computational resources HACSAW API v1.2.0 for quick data access Container environment with common pre-installed and configured data science tools HPC system dedicated to cyber SA data analytics Documented workflow environment for collaborators Page-9

Data Repository Most comprehensive cybersecurity data set available to DoD R&D community Collection of data sources from Internet Access Points (IAPs) to regional Service Delivery Points (SDPs) to the host-level Non-anonymized data Processing pipeline with redundant data storage and controlled access Rapid acceleration and exponential growth in size and complexity Proven open source, big data technologies for data enrichment throughout processing pipeline Page-10

Total Data Feed Collection ~520 TBs 600000 500000 400000 GBs 300000 200000 Data Volume 100000 0 Date *Processing ~8 billion events/day, ~92k events/second, ~3.5TBs/day as of 25 July 2017 Page-11

UNCLASSIFIED HACSAW The Big Picture Data Sources Ingest Index Assess Query Analyze Cyber Operational Attributes Host Based Security System (HBSS) Multi Router Traffic Grapher (MRTG) / Network Flow Statistics Cybersecurity Environment for Detection, Analysis, & Reporting (CEDAR) Assured Compliance Assessment Solution (ACAS) Rapid Audit of Unix (RADIX) Apache Kafka Data ingest & message passing Apache Spark Parallel processing of events ElasticSearch Index, retrieval, & analysis of events Kibana Analytical Visualization Sharing & Collaboration Cyber Risk & Event Analysis Cyber Threat Analysis Alert Detect Monitor Results Cyber SA Mission Essential Tasks (MET) Page-12

HPC Development Environment Hardware (virtualized) 36 cores 3.2 GHz 117 GB user accessible memory TB s of user accessible storage Computational equivalent of one Topaz node Software Use a standard analytical environment by using a Docker container Supports repeatable experimentation Includes common software, Spark, Dask, Pandas, Jupyter APIs to access data via Python module All software in container is compatible with the HPC environment Additional software can be added if it meets requirements Page-13

HPC Development Workflow 1. DATA EXPLORATION Identify relevant data sources and its underlying structure, purpose, and usefulness. In this stage, collaborators will exercise APIs and ontology to be used in the initial analytic development. 02 3. DEPLOY & COLLECT Once initial development has been completed, collaborators will deploy their analytic against real world data. Results will be collected and provided to collaborators in near-real time. 03 04 01 2. INITIAL DEVELOPMENT Using the data analytics environment with HACSAW, collaborators will begin initial development of their proposed analytics by working with real HPC data. 4. REFINE & MEASURE Collaborators will refine their analytic and benchmark it s accuracy and overall performance. At this point, collaborators will have a stable and HPC-ready analytic. Page-14

HPC Workflow 5. INITAL EVALUATION Verify the the results obtained from the prototype system warrant further evaluation on an HPC system. Verify that better or faster results could be obtained with more resources and the algorithm will work at scale. 06 7. REFINE & MEASURE Once application porting has been completed, collaborators will deploy their analytic against large scale real data. This will take place in a batch environment, allowing larger scale tests but with a slower response. 07 08 05 6. HPC DEVELOPMENT Collaborators will port the application code to run effectively on the HPC machine. This includes an analysis of needed data and working with data owners to ensure data availability on the HPC. 8. FINAL EVALUATION At this point the code has been fully developed and vetted on an HPC and is ready to move into production. Page-15

Workflow Walk-through Example: Develop machine learning classifier (random decision forest) to detect anomalous browser User- Agents Data: HTTP logs, logs from existing pattern alerters Methodology: Create training sets based on existing alerter rules over various periods of time, and with varying training methods Expected result: Classifier will perform as well as existing anomalous user-agent detection alerters (95%>) Page-16

Workflow Walk-through 1. Initial Development Docker container with provided software Development on interactive system with real data Data API accessed via Python module 2. Deploy & Collect Train random forest using largest possible time periods on development system Compare results with existing pattern alerters 3. Refine Algorithm Evaluate methods and training data size 4. Initial HPC Evaluation Ensure the developed application will transfer to the HPC 5. HPC Development Improve algorithm scalability Define data movement to HPC 6. Refine Algorithm Evaluate methods and training data size 7. Final Evaluation Perform training on HPC with large data sets Deploy these classifiers on real time and historical data Compare to existing pattern alerters 8. Production Collaborate with HACSAW team to deploy classifier against real-time operational data Create policies for periodically retraining data Page-17

Call for Proposals Purpose: Solicit proposal and work effort that yields results during a one-year effort that demonstrates potential for integration into the HPC Cyber SA operational environment and aligns with Mission Essential Tasks (METs) Strongly encourage teams that span organizations Teams must include data science, cybersecurity and HPC expertise Secret Security Clearance required for access No file transfers Must sign document acknowledging data access restrictions Important dates March 24: Call for Proposals May 22: Proposals Due July 25: Final Evaluation August 25: Anticipated Award Announcement Page-18

FY18 Next Steps Establish Board of Directors Provide oversight for HACSAW activities Develop strategic direction and provide guidance Refine HPC Cyber SA architecture Update HACSAW API, continuous HPC processing pipeline improvement, expand/refine data repository and software stack Discovery, exploration and enlightenment Collaborators perform data analytics studies against static datasets that align with cyber SA METs Benchmarks are performed comparing HPC solutions with traditional non-hpc solutions Determine the future of the project Collaborators document and report findings Major decision point regarding continuation or modification of project objectives Page-19

Conclusion Multi-tiered approach to ensuring a productive, secure, and trustworthy environment for the RDT&E and acquisition engineering communities Cybersecurity investment to increase the HPCMP s current and predictive understanding on the DREN Continued engagement and collaboration with Services and Agencies to leverage HPC, cyber and data science expertise for shared situational awareness Page-20

Questions? Leslie C. Leonard, PhD Cybersecurity Research Lead Leslie.Leonard@hpc.mil Page-21

Abbreviations and Acronyms TERM API DoD DREN ERDC ESG HACSAW HPC HPCMP HTTP IAP MET SA SDP DEFINITION Application Program Interface Department of Defense Defense Research and Engineering Network Engineer Research and Development Center Executive Steering Group HPC Architecture for Cyber Situational Awareness High Performance Computing High Performance Computing Modernization Program Hyper Text Transfer Protocol Internet Access Point Mission Essential Task Situational Awareness Service Delivery Point Page-22