Edge Computing: The Next Frontier for the IoT Business Landscape Edge/Persisted Storage, Governance, and Market Drivers Considerations and Limitations of Centralized Data Lakes ThingsExpo New York June 2016 1
Agenda High level IoT direction The role of the First Receiver Trends towards Data Lakes for IoT data Data Lake limitations and augmentation 2
What do we know? 3
Tons of Data
Products Smart Products Smart Connected Products Product Systems Systems of Systems
IoT is being driven by the Capital Equipment Providers
What outcomes do Enterprises want from the Internet of Things?
Enterprises will demand leverage from IoT data 8
They will do this by driving deployment architectures capable of best accommodating this for all constituencies 9
IoT from an Enterprise point of view: McDonald s as a Hypothetical
THE ROLE OF THE FIRST RECEIVER
McDonald s three years ago: No IoT Subsystems
First Level IoT: Most IoT Deployments are centered around capital equipment and are closed loop silos Alerts, Triggers, Actions Alerts, Triggers, Actions Closed Loop Message-Response System Rules/ Workflow Cloud Based Central Repository Data is mostly owned and controlled by the vendors 13
Now vendors are recognizing the need to better leverage data Alerts, Triggers, Actions Alerts, Triggers, Actions Rules/ Workflow Closed Loop Message-Response System Cloud Based Central Repository Analytic Workbench: Operational, Investigative, Predictive Analytics and Machine Learning Enterprise Apps: ERP, CRM, and other enterprise apps Possible Specialized Store 14
IoT Enabled Lighting Lighting Vendor IoT Enabled Kitchen Kitchen Vendor Beacon System Beacon Vendor IoT Enabled HVAC HVAC Vendor IoT Enabled Vehicle Tracking McDonald s today: IoT data goes to capital equipment providers Vehicle Tracking Vendor
They will (often and soon) need to accommodate Edge Processing As basic edge processing, followed by more fully developed First Receiver Alerts, Triggers, Actions Closed Loop Message-Response System Rules/ Workflow (& Filtering) Edge / 1st Receiver Rules/ Workflow Cloud Based Central Repository This will begin to call into question who owns the data Edge / Apply rules and workflow against that data Edge/ Take action as needed Edge/ Filter and cleanse the data exhaust (increasing payload) FR / Store local data for local use FR / Enhance security FR /Provide governance admin controls 16
Edge/ First Receiver Local processing, filtering, enriching, and propagation to various constituents IoT Enabled Lighting IoT Enabled Kitchen New Apps Enterprise Apps Lighting Vendor HVAC Vendor Vehical Tracking Vendor Beacon System Regional and Corporate Offices Regional Corporate McDonald s Corporate IoT Enabled HVAC IoT Enabled Vehical Tracking The right architecture provides local use and accommodates constituents FDA Supply Chain Partners
Publish Subscribe Ultimately, Architectural Accommodation will be Critical (Ownership, Stewardship, Governance, Privacy, Liability ) Net New IoT Solutions (resulting from data leverage) Enhanced Versions of Initial IoT Solutions Various Sensor Devices Rules/ Workflow (& Filtering) Analytic Workbench: Operational, Investigative, Predictive Edge Processor Persisted Store External Data ERP, CRM, etc. User Remote Site ( McDonald s San Jose ) Edge / First Receiver 18 Rules/ Workflow Rules/ Workflow Rules/ Workflow Analytic Workbench Enterprise Apps Cloud-Based Central Cloud-Based Central Repository Repository Central Repository SATO (One of multiple vendor silos) Analytic Workbench Cloud-Based Central Repository User Corporate (McDonald s) Analytic Workbench Enterprise Apps Cloud-Based Central Repository Government (FDA) Enterprise Apps
Ingest Sensor Messages Accommodate Multiple Protocols Basic Edge (First Receiver) Considerations Local Consuming Applications Local Analytic Workbench Apply Basic Rules and Workflow Enhance Security Alerting Triggering Actions Filtering Aggregation & Computation Governance Controls Publish To Subscribing Constituents Equipment Driver Management External Constituents and Consuming Applications Persist Data Locally 19
Ingest Sensor Messages Accommodate Multiple Protocols Basic Edge (First Receiver) Storage Considerations Local Consuming Applications Apply Basic Rules and Workflow Enhance Security Alerting Triggering Actions Filtering Aggregation & Computation Data Storage Requirements Local Analytic Workbench Minimal administration (deployed at remote sites) Governance Controls Small footprint (high compression) Publish To Subscribing Constituents Equipment Driver Management Fast ingestions speeds External Constituents and Consuming Applications Persist Data Locally Data maneuverability (preplanned and ad hoc queries) 20
Edge (First Receiver) Processing Assumptions Limited or no human resources for maintaining the database or other system capabilities at the edge must be a hands off operation, with remote monitoring or control only Hardware footprint will be limited Not all use cases apply but many do Factories Retail/Restaurants Homes (but with far less data) Buildings Many aspects of smart cities Hospitals (but only marginally for personal health) Cars (Edge on board) Other transportation modalities (especially planes, trains, and ships) Oil and Gas 21
MIGRATION TO CENTRALIZED DATA LAKES
Most Analysts and Vendors Believe IoT is driving a massive increase in data volume Cloud computing is the new norm for enterprise applications NoSQL based data lake architectures are best suited for underpinning Data Lakes 24
But Most Also Believe Data Lakes and NoSQL technology have limitations when it comes to execution of queries at scale (PB+) Joining over multiple tables Complex queries Material consumption of time and resources matter Consider: queries are typically being joined in a query chain, where the result of query 8 forms the basis for query 9 equivalent insight can be gained from a high value aproximation 25
Doing the Math If Data is getting exponentially larger... And the complexity is growing Then Role of First Receiver becomes more important in architecture There are still accommodations needed at main central repository 26
So what needs to happen to make the Data Lakes effective? 27
It has to be PRACTICAL to gain insight from the DATA
A New Approach Not every query requires an exact answer Metadata or sampling used for reducing time and resource utilization Take what is TECHNOLOGICALLY POSSIBLE and enable in a manner that is PRACTICAL 29
Approximation Techniques to Augment Statistical metadata Models for Approximation Rules/ Workflow (& Filtering) Data Lake (Hadoop, Cassandra, etc.) Analytic Workbench Operational Investigative Predictive Machine Learning Enterprise Apps ERP CRM Capacity Planning Etc. Specialized Data Specialized Stores Data Stores Central Processing (Data Lake) 30
IoT: Exploding Data for Many Use Cases Operational Surveillance Logistics Demand Forecasting Cybersecurity Warranty Analytics Agricultural Yield Management Exponentially more data will be available More data sets will be combined for richer signatures Some IoT use cases will not require exact answers immediately Most will never need exact answers 31
Resulting Architecture 32
Publish Subscribe Leveraging the Utility Value of Data in IoT Various Sensor Devices ERP, CRM, etc. Rules/ Workflow (& Filtering) Purpose Built Persisted Store External Data First Receiver (Edge) Analytic Workbench: Operational, Investigative, Predictive Rules/ Workflow (& Filtering) Data Lake (Hadoop, Cassandra, etc.) Metadata for Augmentation Specialized Data Specialized Stores Data Stores Central Processing (Data Lake) Analytic Workbench Operational Investigative Predictive Machine Learning Enterprise Apps ERP CRM Capacity Planning Etc. 33
Thank You 34
Supplemental Slides 35
Infobright for IoT Edge (First Receiver) Use of patented metadata approach to establishing and maintaining the environment No admin Tiny data footprint (huge compression) Exceptional performance, especially ad hoc query capabilities Extremely well-suited as an embedded OEM database for solutions that store and analyze machine data This is installed in thousands of production sites in Telco and Security edge processing today
Infobright Augmenting Data Lakes Utilize approximate query (rapid insight results engine) metadata is 1%-2% of total storage requirements of actual data Interesting considerations: Smaller intermediate space requirements Dramatically fewer I/Os often in-memory based Dramatically lower redistribution costs Sizing: Storage based - 1/50 th number of nodes or less Performance based 1/100 th to 1/1000 th number of nodes This is installed in Beta and will GA in early October
Compelling Performance Engine vs. Traditional Analytic Databases 1000 900 800 700 600 500 400 300 200 100 0 1/1000th Query Time Operate on representation of 64k rows vs. individual rows Works with traditional relational operators (e.g. project, aggregate, join, sort, etc.) Operations are individually more complex 64k fewer operations performed Analytic query performance >Up to 1,000X better compared to operations on actual rows 38
Creating Metadata in the Background IAQ agents read data upon ingestion or from foreign sources Agents are high speed, horizontally scalable Metadata is continuously built onthe-fly by ongoing highly scalable background process 39
Publish Subscribe Infobright: In the Edge and Over the Lake Various Sensor Devices ERP, CRM, etc. Rules/ Workflow (& Filtering) Infobright Enterprise Edition Persisted Store External Data First Receiver (Edge) Analytic Workbench: Operational, Investigative, Predictive Rules/ Workflow (& Filtering) Data Lake (Hadoop, Cassandra, etc.) Infobright Approximate Query Specialized Data Specialized Stores Data Stores Central Processing (Data Lake) Analytic Workbench Operational Investigative Predictive Machine Learning Enterprise Apps ERP CRM Capacity Planning Etc. 40