Big Data at PennDOT (ISTO DW-BI Team)

Similar documents
Business is being transformed by three trends

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud

ARCHITECTURES ADVANCED ANALYTICS & IOT. Presented by: Orion Gebremedhin. Marc Lobree. Director of Technology, Data & Analytics

Analyzing Data with Power BI

Microsoft Azure Essentials

Boston Azure Cloud User Group. a journey of a thousand miles begins with a single step

AVL and 511PA in Winter and Incident Management. National Winter Maintenance Peer Exchange September 12, 2017 Pittsburgh, Pennsylvania

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

Microsoft FastTrack For Azure Service Level Description

Designing Business Intelligence Solutions with Microsoft SQL Server 2014

Business Intelligence in Azure Alex Whittles

Audience Profile The course will likely be attended by SQL Server report creators who are interested in alternative methods of presenting data.

: 20776A: Performing Big Data Engineering on Microsoft Cloud Services

APPENDIX A: SHORT-TERM PROJECT DEPLOYMENTS

Your Big Data to Big Data tools using the family of PI Integrators

Implementing Microsoft Azure Infrastructure Solutions

AVANTUS TRAINING PTE LTD

Internet of Things. Point of View. Turn your data into accessible, actionable insights for maximum business value.

ETL challenges on IOT projects. Pedro Martins Head of Implementation

Analyzing Data with Power BI

COURSE OUTLINE: Course 20533C- Implementing Microsoft Azure Infrastructure Solutions

PRODUCT UPDATES APJ PARTNER SUMMIT - BALI. February Software AG. All rights reserved. For internal use only

The Internet of Things Wind Turbine Predictive Analytics. Fluitec Wind s Tribo-Analytics System Predicting Time-to-Failure

Copyright 2014, Oracle and/or its affiliates. All rights reserved. 2

Microsoft BI Product Suite

Azure PaaS and SaaS Microsoft s two approaches to building IoT solutions

DYNAC. Advanced Traffic Management.

Optimizing resource efficiency in Microsoft Azure

Oracle Enterprise Data Quality Product Roadmap and Statement of Direction. October 2016

Big Data Analytics for Retail with Apache Hadoop. A Hortonworks and Microsoft White Paper

20775: Performing Data Engineering on Microsoft HD Insight

Real-time Streaming Insight & Time Series Data Analytic For Smart Retail

Control Anything. Gain Insights. Connect Things. Action. 10% of the data on earth will come from IoT by B connected devices by 2020

Jason Virtue Business Intelligence Technical Professional

Digitalisieren Sie Ihr Unternehmen mit dem Internet der Dinge Michael Epprecht Microsoft GBB IoT

HP SummerSchool TechTalks Kenneth Donau Presale Technical Consulting, HP SW

Azure Offerings for Big data. In Kee Paek Cloud Data Solution Architect Microsoft Korea October. 2016

Microsoft Developer Day

PennDOT s Statewide Operations Strategy

New Ways to Leverage Open Source

Safe Harbor Statement

Implementing Microsoft Azure Infrastructure Solutions

Adopting Azure Resource Manager for efficient cloud infrastructure management

SAP S/4HANA Cloud The intelligent Cloud ERP revolutionizing the way you do business

Oracle PaaS and IaaS Universal Credits Service Descriptions

Enterprise Architecture for Digital Business

Azure IoT Suite. Secure device connectivity and management. Data ingestion and command + control. Rich dashboards and visualizations

Fast Start Business Analytics with Power BI

MapR Pentaho Business Solutions

POWER BI OVERVIEW & FEATURES JANUARY 2017, SINGAPORE. Khilitchandra Prajapati

TECHNOLOGY PLATFORM STRATEGY

"Charting the Course... MOC A: Architecting Microsoft Azure Solutions. Course Summary

House Keeping. You are in Listen Only Mode. Azure 101: Azure Overview. Azure 201: How to do a Cost Estimate for Virtual Machines

MS-20533: Implementing Microsoft Azure Infrastructure Solutions

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

Introducing Infor Xi/Ming.le for M3

Your Technology Partner Offshore and Onsite. Services Portfolio

Oracle Paas. Rino Weggers, Customer Success Manager Frank Brink, Customer Success Manager November 17, 2015

Streaming Analytics, Data Lakes and PI Integrators

Microsoft Dynamics 365 and Columbus

How to Build Your Data Ecosystem with Tableau on AWS

Position Description. Job Summary: Campus Job Scope:

InfoSphere Warehouse. Flexible. Reliable. Simple. IBM Software Group

Master Cloud Microsoft

HPE Flexible Capacity with Microsoft Azure & Azure Stack

Responsive enterprise the future of the enterprise PERSPECTIVE

The New Business Operating System. Combining Office 365 and the Microsoft Cloud Ecosystem to Create Business Value

Big and Fast Data: The Path To New Business Value

Secure information access is critical & more complex than ever

Request for Information 18-RFP-004-LAJ WOTC Application Management System. Questions and Answers

The Basics of Business Intelligence. PMI IT LIG August 19, 2008

Transitioning Guide. Important information to help you transition to Microsoft Dynamics 365 from Dynamics CRM THE MICROSOFT SUITE CONSISTS OF.

Creating an integrated plug-and-play supply chain with serverless computing

A World of Data. Raghu Ramakrishnan. CTO for Data, Technical Fellow Microsoft

OPTIMIZING GOVERNMENT

Innovate with Oracle Public Cloud Platform & Infrastructure Services

Stuck with Power BI? Get Pyramid Starting at $0/month. Start Moving with the Analytics OS

Oracle Cloud Blueprint and Roadmap Service. 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Power BI. The shift to business-led self-service analytics. Gogula Aryalingam. Senior Architect Data Analytics Brandix i3, SRI LANKA

CA UIM Log Analytics. Gain Full Stack Visibility With Contextual Log Insights. Mark Tukh Principal Presale Consultant CA NESS AT

ION DIFFERENTIATION FLEXIBLE STANDARDS BASED MESSAGING BUS CONNECT APPLICATIONS IN A VARIETY OF WAYS CREATE MONITORS / ALERTS CREATE CUSTOM WORKFLOWS

From Things to Value

Middleware Modernization: lay the foundation to your digital success

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Architecture Overview for Data Analytics Deployments

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Make the most of the cloud with Microsoft System Center and Azure

Azure. Bruno Kovačić Axilis, Microsoft MVP

Corporate Overview CRM. the Cloud

Financial Planning & Analysis Solution. A Financial Planning System is one of the core financial analytics applications that an enterprise needs.

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

Big Data for TIM Big Opportunities, Big Challenges

Microsoft Big Data. Solution Brief

INTRODUCING BIRST INFOR S GO-FORWARD BUSINESS INTELLIGENCE SOLUTION

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

Smart Public Safety: Advanced Sensors, Automation and the Internet of Things (IoT) in NG9-1-1

NFLABS SIMPLIFYING BIG DATA. Real &me, interac&ve data analy&cs pla4orm for Hadoop

Intel Public Sector 3

Transcription:

Big Data at PennDOT (ISTO DW-BI Team)

DW/BI at PennDOT The DW/BI team provides a robust Data Warehouse and Business Intelligence platform (PDIF), and delivers DW-BI solutions and services to the Department. The DW-BI team s technology and knowledge services include: 1. BI: create dashboards and reports, and provide assistance to other teams 2. Custom app development for the PDIF BI Portal and other BI-related custom needs 3. Data Warehousing, Data Integration, Data Migration Maintain and enhance the enterprise data warehouse (PDIF DW) Perform data migrations in support of technology modernization projects. Perform ETL move and transform data. Build interfaces between applications 4. Data Modeling Create data models for new enterprise systems. Provide data modeling assistance as needed for smaller efforts across ISTO. 5. Database development support Help with complex SQL queries and performance tuning Support for Stored procedure development, (ex. oracle pl/sql)

Big Data

What is Big Data Big Data is a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. Characteristics of Big Data

Big Data Opportunities at PennDOT Opportunities The use of big data is relevant for PennDOT in traffic control, planning and modeling, route planning, congestion management, optimizing material usage, and more. Big data in PennDOT will lead to improved traffic and mobility management. It provides new insights into traffic patterns, real-time traffic data to information service providers. Data Sources INRIX Vehicle Speed data available for every minute of every data for each road segment WAZE Social Media traffic app that provides real time user-reported incidents of various types. ATS PM Automated Traffic Signal data collected from smart traffic signals. AVL Automated Vehicle Location (truck sensor data) Weather And more

Big Data Technology Strategy The Big Data ecosystem is a vast, fast-changing landscape of tools, products, and methodologies, which informs our strategy Cloud-first approach PaaS and SaaS over IaaS (Platform-as-a-Service and Software-as-a-service over Infrastructure-as-a- Service) Microsoft Azure Well-packaged Big Data ecosystem, heavy focus and investment of Microsoft Agile, flexible processes and architecture Evolving Architecture: More bottom-up than top-down Not a one-size fits all architecture, requires hybrid of traditional and modern big data approaches, such as: Traditional RDBMS s, ex. Sql Server, Oracle Big Data products like HDFS/Hadoop, no sql DBs

Big Data Efforts

WAZE Waze application is a crowdsourcing platform where the public can report incidents and various traffic disruptions in order to gain points. As part of the data exchange, PennDOT receives a real-time data feed of all alerts Waze is reporting in Pennsylvania. These alerts include road closures, slow downs, accidents, potholes, and disabled vehicles. Overview A WebJob, runs every minute, - uses Waze provided URL to retrieve the file using a standard HTTP request. The WAZE response file is a JSON type, includes only PA, is stored on a Azure Blob Storage. The process creates a folder for each day within the container of Azure Blob to store the files. The WebJob is published to a Web App in App Service. Azure PAAS offerings Used WebJobs App Service Web Apps Azure Blob Storage

Waze Architecture

Waze Analytics (Pothole Report) Create a Pothole report based on analysis and data discovery of the Waze Incident data. The pothole concept was expanded to include analytics on all Waze incident types. Overview A Waze analytic DB is created in Azure SQL. ADF loads and transforms the Waze data from blob storage to Azure SQL DB. ADF pipelines are scheduled to run daily once to process the current days data. On Premise nightly process calls a GIS web service and updates the location details like street, city, SR SEG of all the Waze data. Power BI report connects to Waze analytic DB to provide a Pothole report. Azure PAAS/SAAS offerings Used Azure Blob Storage Azure SQL Azure Data Factory (ADF) Power BI (SAAS offering)

Waze Analytics - Pothole Report

Waze Analytics - Pothole Report

Waze Email Alerts - Purpose PennDOT s State Farm Safety Patrol is a roving patrol offering free motorist assistance on select expressways in the Lehigh Valley, Harrisburg, Philadelphia and Pittsburgh regions. The State Farm Safety Patrol assists motorists with towing, jump starts, flat tire repair and more on all or portions of heavily traveled roads during the business week. In 2013, the patrols assisted a total of 17,612 motorists. Harrisburg-area Patrol Service in Cumberland, Dauphin and York counties on Interstates 81 and 83, and Route 581 comprising the Capital Beltway Lehigh Valley Patrol Service in Lehigh and Northampton counties on I-78, U.S. 22, Route 33 and Route 309 Pittsburgh Patrol Service in Allegheny County on Interstates 79, 279 and 376 Philadelphia-area Patrol Service in areas of Bucks, Chester, Delaware, Montgomery and Philadelphia counties

Waze Email Alerts Architecture Waze RSS Feed Windows Service Web Application SQL DB

Waze Email Alerts Create Emails to notify TMC s of Waze Alerts within specified Service Patrol Area Overview A polygon is created that encompasses the defined Service Patrol Roads for each of the four districts A Windows Service fetches data from the Waze RSS every minute for each of the four districts In the API call, a Polygon is passed to the Waze API along with our Waze API CPP Key A JSON response is returned for all alerts types within the defined polygons User defined Waze Alerts configurations and recipients are stored in a SQL DB via the Waze UI The Windows Service stores these rules in memory but checks for modifications every 10 minutes Based on these rules, the windows service with filter through the JSON and only send out alerts based on the specified criteria The JSON alert data is transformed and stored in a structured SQL DB for future analytics

Waze Email Alerts Portal

Waze Email Alerts Portal

Waze Email Alerts Portal

Waze Alerts Configuration

Waze Alerts Polygon Configuration

Waze Alerts Generated Email

Waze Alerts Future Enhancements

Crash Use Case for Azure ML Machine learning / predictive learning (ML/PL) was explored in partnership with Microsoft using Azure ML (part of Cortana Analytics suite) ML/PL models are trained against historical data and get smarter as more data is analyzed. Pilot effort involved analyzing the crash narrative comments section of police reports. The crash narrative is a freeform section where officers can type notes. Purpose was to find harmful events (damaged property) that were not coded correctly and thereby would not be invoiced for reimbursement. The machine learning model was trained to find patterns in the narrative data that indicated likelihood of a harmful event that was not indicated on the harmful events report checkboxes. Azure PAAS offerings Used Azure ML Azure Blob Storage

Azure HDInsight / Hadoop PoC with Inrix speed data The DW BI team performed a PoC in the summer of 2015 comparing HDInsight (Hortonworks Hadoop distribution) query performance at various levels of scaling as well as Sql Server. Query Type SQL Server (DEV) HDInsight (4 Nodes) HDInsight (12 nodes) Operational: COUNT(*) 1m 54m 7m 3m Analytical: Avg. Historical Congestion 6h 4m 3h 38m 36m 13m Analytical: Free-Flow Speed 7h 9m 1h 22m 17m 5m HDInsight (48 nodes) Based on 2.6 billion rows (INRIX by-the-minute traffic data, District 6,18 months) Each HDInsight (A3) node contains 4 core and 7GB of RAM MS SQL Server 2008: PennDOT Dev - 8 core, 16GB RAM Inconclusive results for scaling up (increasing cores/ram of nodes)

Wrap Up Contact: Walt Cook wacook@pa.gov