ADVANCED ANALYTICS & IOT ARCHITECTURES Presented by: Orion Gebremedhin Director of Technology, Data & Analytics Marc Lobree National Architect, Advanced Analytics
EDW THE RIGHT TOOL FOR THE RIGHT WORKLOAD RDBMS Data Stores SSIS Local Data Sources Unstructured data Flat File Upload Azure SQL DW HDInsight Storage blob Azure SQL database On-Premises Reporting & Analytics SSIS Excel (Direct Access)
ESA EDW REFERENCE ARCHITECTURE HYBRID BIG DATA PROCESSING On-Demand-Compute Direct Access/ Report Model Level Integration Cloud Storage Cube PowerShell AZCopy SSIS SSIS Data Layer Level Integration Tabular
ON PREMISES BIG DATA IMPLEMENTATIONS
USE CASE: ETL OFFLOADING Have you outgrown your data delivery SLAs? Is your business frustrated with data delays? Get the right data at the right time.
Neudesic partnered with one of the nation s largest utility companies that recently deployed Smar Utility Meters for power customers, nearly a million meters sending usage data every 15 minutes. The result: an Azure hybrid big data processing solution that enabled the customer to perform gap analytics: a process for identifying gaps that exist in the power usage readings, over 7x faster than their previous solution! Billions of Smart Meter reads get processed to identify the nature and duration of the gaps to mitigate revenue losses.
USE CASE: REAL-TIME ANALYSIS Got end users that need data now? Provide business units the data they need at the time they need it.
REAL TIME TRAFFIC MANAGEMENT Toll Data EventsHub StreamAnalytics Toll Way Event Generator Toll Violations Reference Data Vehicle Registration Toll Violation Tickets
Real-Time Analysis On-premises Using Data Lake to capture all data for everyone. OLTP Kafka Spark MLlib Kafka Logs HDFS PM B DM MDM Machine learning
USE CASE: INTERNET OF THINGS What action does your IoT device drive? Help guide end-users to the action they are looking to take.
VENDING MACHINE MANAGEMENT Vending Machine EventsHub StreamAnalytics Vending Machine Vending Transactions EventsHub Batch Predictions Real-time Notifications Machine learning EventsHub Vehicle Location Info
REAL TIME TRAFFIC MANAGEMENT Toll Data EventsHub StreamAnalytics Toll Way Event Generator Toll Violations Reference Data Vehicle Registration Toll Violation Tickets
IOT WEARABLE MANAGEMENT Processing device data in real time. HD Insight Spark SQL Analyze Device API Azure Event Hub or IOT Hub Azure Stream Analytics Power BI Dataset Temporal Power BI Dashboards
USE CASE: ITERATIVE EXPLORATION What can we do with all of this data? Mine for answers-one question at a time.
ITERATIVE EXPLORATION Build expert systems, move to supervised learning, and evolve to reinforced learning. Web Service used for Orchestration HD Insight Azure Machine Learning API End Point Azure Data Warehouse Power BI
ITERATIVE EXPLORATION Monitor and remove noise from textual data. Web Service used for Orchestration Azure SQL DB Keyword Analytics Power BI Dataset Statistical Media Services Power BI Dashboards Machine Learning API End Point Event Hubs Stream Analytics Power BI Dataset Temporal
USE CASE: SELF SERVICE Are your reports only telling half the story? Quickly deliver large datasets for ad hoc analysis.
SELF SERVICE Allowing business to fulfill their analytics needs. Semi-structured Files Apache Hadoop Spark SQL Analyze Service Bus SQL Server
HYBRID SELF SERVICE
HYBRID SELF SERVICE
USE CASE: DATA AS A SERVICE Got savvy end users that need more data? Provide data scientists with what they need while making it easy for the business user.
Data-as-a-Service USING AZURE Using Data Lake to capture all data for everyone. Data Sources Loading Data Lake Raw Data Lake Building Data Streams Self-Service Catalog SQL Data Factory Click Stream Logs Data Factory Azure ML Azure Data Catalog Data Historian (PI Server) App Service Azure Data Lake Store Data Factory Azure Data Lake Store HDInsight Hive or Spark Power BI Dashboards Device API Azure Event Hub or IOT Hub Azure Stream Analytics Azure Blob Storage
Advanced Analytics Methodology
Solution Development Process Business Objective Understanding Data Understanding Data Model Creation + Testing Integration in Data Strategy Model Creation + Testing Data Acquisition Visual Analysis Model(s) Selection Model Comparison Integration in Data Strategy Build Model + Web Service Location for SQL query Consumption Layer
Model Selection: Supervised (we know the response). Parametric Regression Linear Polynomial Stepwise Binomial Splines Partial Least Squares Generalized Linear Models Classification Logistic Linear / Quadratic Discriminant Analysis Non Parametric K Nearest Neighbors Decision Trees Random Forests Boosting Neural Network Support Vector Machines Generalized Additive Models Forecasting Moving Averages Exponential Smoothing ARIMA Regressions *Some models can change (parametric/nonparametric) and (regression/classification)
Model Selection: MAPE & RMSE & R^2 Mean Average Percent Error Root Mean Square Error Variation explained by Predictor We want to choose the model that reduces the test error and has a high percent value for how much the predictors explains the response
Examining Weather and Active Meters in the System by Time Temperature by time Active Metes by time Seasonality of temperature Constant increase of active meters
Usage & Temp Usage by Day of Week & Verse Temperature Day of Week Trend Hourly Usage Trends Day of Month Temp = Red Usage = Blue
Auto-Regressive Integrated Moving Average ARIMA(p,d,q)x(P,D,Q)[m] AR(p) = number of seasonal autoregressive terms I(d) = number of differencing terms MA(q) = number of seasonal moving average terms m = periods inside frequency Stationary Mean & Variance Avg. Temperature Time Series
NEXT STEP BECOME THE BI SUPERHERO Information Management Big Data Storage Apache Hadoop Real-time intelligence Machine learning IoT Dashboards and Visualizations and more! Ideate, chart your quick wins, ask questions and get answers to your real Big Data challenges. It s insightful, it s easy and can be done from the comfort of your conference room. www.neudesic.com/meetneat
BIG DATA & Advanced Analytics Roadshow Questions? Orion Gebremedhin Orion.Gebremedhin@Neudesic.com Twitter: @oriongm Marc Lobree Marc.Lobree@Neudesic.com