Big Data at PennDOT (ISTO DW-BI Team)
DW/BI at PennDOT The DW/BI team provides a robust Data Warehouse and Business Intelligence platform (PDIF), and delivers DW-BI solutions and services to the Department. The DW-BI team s technology and knowledge services include: 1. BI: create dashboards and reports, and provide assistance to other teams 2. Custom app development for the PDIF BI Portal and other BI-related custom needs 3. Data Warehousing, Data Integration, Data Migration Maintain and enhance the enterprise data warehouse (PDIF DW) Perform data migrations in support of technology modernization projects. Perform ETL move and transform data. Build interfaces between applications 4. Data Modeling Create data models for new enterprise systems. Provide data modeling assistance as needed for smaller efforts across ISTO. 5. Database development support Help with complex SQL queries and performance tuning Support for Stored procedure development, (ex. oracle pl/sql)
Big Data
What is Big Data Big Data is a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. Characteristics of Big Data
Big Data Opportunities at PennDOT Opportunities The use of big data is relevant for PennDOT in traffic control, planning and modeling, route planning, congestion management, optimizing material usage, and more. Big data in PennDOT will lead to improved traffic and mobility management. It provides new insights into traffic patterns, real-time traffic data to information service providers. Data Sources INRIX Vehicle Speed data available for every minute of every data for each road segment WAZE Social Media traffic app that provides real time user-reported incidents of various types. ATS PM Automated Traffic Signal data collected from smart traffic signals. AVL Automated Vehicle Location (truck sensor data) Weather And more
Big Data Technology Strategy The Big Data ecosystem is a vast, fast-changing landscape of tools, products, and methodologies, which informs our strategy Cloud-first approach PaaS and SaaS over IaaS (Platform-as-a-Service and Software-as-a-service over Infrastructure-as-a- Service) Microsoft Azure Well-packaged Big Data ecosystem, heavy focus and investment of Microsoft Agile, flexible processes and architecture Evolving Architecture: More bottom-up than top-down Not a one-size fits all architecture, requires hybrid of traditional and modern big data approaches, such as: Traditional RDBMS s, ex. Sql Server, Oracle Big Data products like HDFS/Hadoop, no sql DBs
Big Data Efforts
WAZE Waze application is a crowdsourcing platform where the public can report incidents and various traffic disruptions in order to gain points. As part of the data exchange, PennDOT receives a real-time data feed of all alerts Waze is reporting in Pennsylvania. These alerts include road closures, slow downs, accidents, potholes, and disabled vehicles. Overview A WebJob, runs every minute, - uses Waze provided URL to retrieve the file using a standard HTTP request. The WAZE response file is a JSON type, includes only PA, is stored on a Azure Blob Storage. The process creates a folder for each day within the container of Azure Blob to store the files. The WebJob is published to a Web App in App Service. Azure PAAS offerings Used WebJobs App Service Web Apps Azure Blob Storage
Waze Architecture
Waze Analytics (Pothole Report) Create a Pothole report based on analysis and data discovery of the Waze Incident data. The pothole concept was expanded to include analytics on all Waze incident types. Overview A Waze analytic DB is created in Azure SQL. ADF loads and transforms the Waze data from blob storage to Azure SQL DB. ADF pipelines are scheduled to run daily once to process the current days data. On Premise nightly process calls a GIS web service and updates the location details like street, city, SR SEG of all the Waze data. Power BI report connects to Waze analytic DB to provide a Pothole report. Azure PAAS/SAAS offerings Used Azure Blob Storage Azure SQL Azure Data Factory (ADF) Power BI (SAAS offering)
Waze Analytics - Pothole Report
Waze Analytics - Pothole Report
Waze Email Alerts - Purpose PennDOT s State Farm Safety Patrol is a roving patrol offering free motorist assistance on select expressways in the Lehigh Valley, Harrisburg, Philadelphia and Pittsburgh regions. The State Farm Safety Patrol assists motorists with towing, jump starts, flat tire repair and more on all or portions of heavily traveled roads during the business week. In 2013, the patrols assisted a total of 17,612 motorists. Harrisburg-area Patrol Service in Cumberland, Dauphin and York counties on Interstates 81 and 83, and Route 581 comprising the Capital Beltway Lehigh Valley Patrol Service in Lehigh and Northampton counties on I-78, U.S. 22, Route 33 and Route 309 Pittsburgh Patrol Service in Allegheny County on Interstates 79, 279 and 376 Philadelphia-area Patrol Service in areas of Bucks, Chester, Delaware, Montgomery and Philadelphia counties
Waze Email Alerts Architecture Waze RSS Feed Windows Service Web Application SQL DB
Waze Email Alerts Create Emails to notify TMC s of Waze Alerts within specified Service Patrol Area Overview A polygon is created that encompasses the defined Service Patrol Roads for each of the four districts A Windows Service fetches data from the Waze RSS every minute for each of the four districts In the API call, a Polygon is passed to the Waze API along with our Waze API CPP Key A JSON response is returned for all alerts types within the defined polygons User defined Waze Alerts configurations and recipients are stored in a SQL DB via the Waze UI The Windows Service stores these rules in memory but checks for modifications every 10 minutes Based on these rules, the windows service with filter through the JSON and only send out alerts based on the specified criteria The JSON alert data is transformed and stored in a structured SQL DB for future analytics
Waze Email Alerts Portal
Waze Email Alerts Portal
Waze Email Alerts Portal
Waze Alerts Configuration
Waze Alerts Polygon Configuration
Waze Alerts Generated Email
Waze Alerts Future Enhancements
Crash Use Case for Azure ML Machine learning / predictive learning (ML/PL) was explored in partnership with Microsoft using Azure ML (part of Cortana Analytics suite) ML/PL models are trained against historical data and get smarter as more data is analyzed. Pilot effort involved analyzing the crash narrative comments section of police reports. The crash narrative is a freeform section where officers can type notes. Purpose was to find harmful events (damaged property) that were not coded correctly and thereby would not be invoiced for reimbursement. The machine learning model was trained to find patterns in the narrative data that indicated likelihood of a harmful event that was not indicated on the harmful events report checkboxes. Azure PAAS offerings Used Azure ML Azure Blob Storage
Azure HDInsight / Hadoop PoC with Inrix speed data The DW BI team performed a PoC in the summer of 2015 comparing HDInsight (Hortonworks Hadoop distribution) query performance at various levels of scaling as well as Sql Server. Query Type SQL Server (DEV) HDInsight (4 Nodes) HDInsight (12 nodes) Operational: COUNT(*) 1m 54m 7m 3m Analytical: Avg. Historical Congestion 6h 4m 3h 38m 36m 13m Analytical: Free-Flow Speed 7h 9m 1h 22m 17m 5m HDInsight (48 nodes) Based on 2.6 billion rows (INRIX by-the-minute traffic data, District 6,18 months) Each HDInsight (A3) node contains 4 core and 7GB of RAM MS SQL Server 2008: PennDOT Dev - 8 core, 16GB RAM Inconclusive results for scaling up (increasing cores/ram of nodes)
Wrap Up Contact: Walt Cook wacook@pa.gov