How to Utilize Your Data to Provide Information to Decision Makers in Traffic Safety ATSIP August 2nd

Similar documents
Transcription:

How to Utilize Your Data to Provide Information to Decision Makers in Traffic Safety 2011 ATSIP August 2nd

Presentations Building Data Warehouses and Using Cubes and Dimensions to Aggregate Data for Fast Web Delivery Cory Hutchinson and Chas. Cavalier Using Microsoft Analysis Services, Power Pivot and Sharepoint 2010 to Create Ad Hoc Analysis of Crash Data Max Kelly and Dr Helmut Schneider

Presentations Building Intelligence Tools to Better Understand and Display Crash Data to Assist Decision Makers Mark Verret and Cory Hutchinson Using Safety Performance Functions to Provide a Black Spot Analysis Dr Helmut Schneider and Christian Raschke

What is the Highway Safety Research Group (HSRG)? Grant funded by the LA DOTD Responsible for collecting, maintaining, storing, and analyzing crash data captured from law enforcement agencies throughout the state of Louisiana Analyzing crash data for LA since 1994

What is the Highway Safety Research Group (HSRG)? A division of the Information Systems and Decision Sciences Department (ISDS) within the E. J. Ourso College of Business at Louisiana State University Website: http://hsrg.lsu.edu

Building Data Warehouses and Using Cubes and Dimensions to Aggregate Data for Fast Web Delivery

What is Business Intelligence (BI)? Refers to applications and technology, which is used to gather, provide access to, and analyze data and information. A broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions. A process of transforming data into information and making it available to users in time to make effective decisions.

BI Projects within HSRG Data quality Performance Measures Data Analysis Statistics Data Reporting

Pre BI Project Infrastructure

Challenge Goals of BI Project: Focus shift on data delivery/analytics Provide information to decision makers Separate transactional and reporting operations Provide single version of the truth Leverage new technology and provide platform standardization in-line with our current competencies

BI Using Microsoft SQL 2008R2

Post BI Project Infrastructure

On-Line Analytic Processing (OLAP) Methodology for optimizing data for analysis and reporting Pre-Aggregates some data Enables faster, more flexible analysis

Cubes OLAP databases are called Cubes The Multi-Dimensional Expression (MDX) language accesses cube data MDX Analyst Aggregated Data OLAP Cube Database

Cube Structure Measures are Aggregated by Dimensions

Cube Structure Fatal Crashes in the Acadiana Region in January Acadiana Baton Rouge New Orleans March Region Month Severity Fatal Severe Moderate January February

Developing Cubes SQL Server Business Intelligence Development Studio (BIDS) used for developing cubes

Cubes in Development Louisiana Strategic Highway Safety Plan Central source for analysis and reporting Emphasis Areas Alcohol Impairment Young Drivers Seatbelt Use Roadway Departure

Interacting with the Cubes Client Programs BIDS Cube Browser SharePoint Dashboards Websites Users need not know MDX

SHSP Cube Structure

BIDS Cube Browser

Microsoft Excel Uses Pivot Table interface Not demanding on less technical users Dashboards can be created using Excel Easy to integrate Excel based solutions Let s answer a few SHSP questions

Excel as Cube Browser

Excel Dashboard

Using Microsoft Analysis Services, Power Pivot and Sharepoint 2010 to Create Ad Hoc Analysis of Crash Data

Excel 2010 Advantages New features in Excel 2010 Overcome Excel s Row Limit of earlier version Most people are used to using Excel User-Friendly and Intuitive New PowerPivot Tool Import from any data source Access to Excel s PivotTable Tool SharePoint 2010 Easy sharing of Excel and PowerPivot Secure and refreshable

What is PowerPivot? PowerPivot is a free Excel Data Analysis Add-In http://www.powerpivot.com/ Allows to import large data sets over 1 million rows Create Pivot Tables and graphs for cross tabulation

PowerPivot Power Pivot Tab PowerPivot Window

Dimensions

PowerPivot

PowerPivot - PivotTables Rows Columns Filter Horizontal and Vertical slicers

PowerPivot - PivotTables

PowerPivot - PivotCharts

Example PowerPivot.xlsx

How do you share your tables and graphs? SharePoint is like an intranet with controlled access protected by passwords SharePoint 2010 allows to share information from PowerPivot Easy to save spreadsheet to SharePoint

Saving to Sharepoint

Sharepoint

Sharepoint

Example of SharePoint SharePoint

Building Intelligence Tools to Better Understand and Display Crash Data to Assist Decision Makers

Effective Decision Making Key ingredients necessary for making effective decisions Must be a set of goals to work towards Must be a way to measure whether a chosen course is moving towards or away from those goals Information based on goals must be provided to the decision maker in a timely manner

Business Intelligence Facts and figures are not BI until They can be put in a format that can be easily understood by decision makers who use them. They can be delivered in a time to meaningfully affect daily decision making.

Effective Decision Making It would be nice to have a warning signal to help identify potential problems.

Reporting Services TRCC Performance Measures Location Reporting Report

5-21-2010 7 Problem Identification-HSRG 47

Reporting Services TRCC Performance Measures Timeliness Report

Reporting Services Allow Agencies to see their crash data Agency Crash Data Report

5-21-2010 7 Problem Identification-HSRG 52

5-21-2010 7 Problem Identification-HSRG 53

Next Steps Data Portals GIS Integration

BI Website LA Strategic Highway Safety Plan Data http://lashspdata.lsu.edu

OLD SHSP Website

SHSP Data Website

SHSP Data Website

SHSP Data Website

SHSP Data Website

SHSP Data Website

SHSP Data Website

SHSP Data Website

SHSP Data Website

SHSP Data Website

Using Safety Performance Functions to Provide a Black Spot Analysis Number of Crashes per year 5.00 4.00 3.00 2.00 1.00 0.00 0 5000 10000 15000 20000 AADT

Black Spots or Abnormal Locations What are black spots or abnormal locations? How are they defined? How are they selected? How does the Empirical Bayes Model relate to black spots analysis? How does section length complicate the issue 5-21-2010 67

Factors Affecting Crash Counts Road Segment Length Longer roads segments are expected to have more crashes Average Daily Traffic (ADT) Roads with larger ADT are expected to have higher crash counts Road Segment Width Narrower roads are expected to have more crashes Shoulder Width Road with narrower shoulders are expected to have more crashes Hazard Rating And others 68

Example: HSM SPF for Rural Two-way Roadway Segments Roadway Segment Base Condition N = AADT L 365 10 e SPF 6 ( 0.312) Symbol N SPF L AADT Description Predicted total number of crashes per year Length of roadway segment in miles Average annual daily traffic volume 69

SPF and Average Number of Crashes per Mile per Year Number of Crashes per year per mile 4.50 4.00 3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00 Dividing by miles standardizes the mean but not the variance SFP 0 5000 10000 15000 20000 Beware: Different length will result in different variation AADT 70

Predicted Number of Crashes is loglinear predicted SPFx ( CMF ) 1x CMF2 x CMFyx Cx N = N... Symbol N predicted N SPFx CMF yx C x Description Predicted average crash frequency for a specific year of site type x Predicted average crash frequency determined for base conditions of the SPF developed for site type x Crash Modification Factors specific to SPF for site type x Calibration factor to adjust SPF for local conditions for site type x

Modeling Issues Interactions are only modeled with AADT. What about other interactions Lane width and shoulder width etc. 72

Average Number of Crashes for SPF & CMF for Lane Width, Shoulder Width Crashes per mile and year 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 SPF LNCR MEDCR UNCR 0.0 0 2000 4000 6000 8000 10000 12000 14000 16000 AADT 73

Predicted Crash Counts for Non-Base Condition Including CMFs Predicted Crash Counts for Two-Way Rural Roads (N predict ) 10 Crash Count per mile and year 8 6 4 2 0 0 5000 10000 15000 20000 25000 AADT Observation: Not a straight line because of the other factors (namely shoulder and pavement width, etc.) Beware: Need to look at more than just AADT! 74

Plotting Crash Data Don t forget the Variation Number of Crashes per year per mile 4.50 4.00 3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00 N Observed 0 5000 10000 15000 20000 AADT N predict with CMFs SPF 75

Issues relating to Computing the Top p% 1. Compare all locations or compare within a class? 1. Negative Binomial Regression Model with covariates 2. Use class 2. How to rank locations 1. Use crashes per mile 2. Use Empirical Bayes 3. How to account for length of road segment 76

Distribution for the Slice at ADT=10,000 Number of Crashes per year per mile 4.50 4.00 3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00 SFP 0 5000 10000 15000 20000 AADT 77

The Distribution of Mean Crash Counts for fixed Covariates and Specified Length of Road Segment 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 N Predict (Overall Mean) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 78

Is a Road Segment in the Top p% of the Distribution? 0.45 0.4 0.35 0.3 0.25 0.2 0.15 Top p%? 0.1 0.05 0 N Predicted 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 X 79

The Empirical Bayes A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule. (Author Unknown) 80

Empirical Bayes for Negative Binomial Empirical Bayes Weight Within Sample Variance x = ω μ + (1 ω ) y Bi i i i i Mule Horse σ 2 i ωi = σ ζ + 2 2 i i σ = λl ϒ 2 i i i Donkey Between Sample Variance ζ = ( λ L ϒ) 2 2 i i i θ Li ϒ 81

Is a Road Segment in the Top p% of the Distribution? 0.45 0.4 0.35 0.3 0.25 0.2 0.15 Top p% of Posterior Distribution? 0.1 0.05 0 N Predicted 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 X-EB 82

There is a Need for a Simple Robust Method for Selecting Top p% Parametric NegBin Regression Model 1. Negative Binomial Regression Model with covariates (SPF&CMF) 2. Empirical Bayes 3. Length of road segment affects variation Non-Parametric Model using Classes 1. Create categories for all covariates 2. Use N Observed instead of Empirical Bayes for ranking 3. Use smallest (e.g. 0.1 miles) standard length of road segment Assumes accurate information about the covariates are known. Requires computation of posterior distribution. No model assumptions. Requires equal length of segments. No EB necessary. 83

Example Local Road Project Create road segments of 500 feet between intersections (intersections not included) AADT categorized Road width Shoulder width Driveway Density categorized Curve or no curve Passing lane Number of lanes Shoulder Type Two Way left-turn lanes 84

References C.N. Morris, Parametric Empirical Bayes Inference, Journal of the American Statistical Association, Vol. 78, No. 381 (Mar., 1983), pp. 47-55 An Introduction to empirical Bayes Data Analysis, G. Caselle,,The American Statistician, Vol. 39, No. 2 (May, 1985), pp. 83-87 Karim El-Basyouny and Tarek Sayed, Comparison of Two Negative Binomial Regression Techniques in Developing Accident Prediction Models, Transportation Research Record: Journal of the Transportation Research Board, No. 1950, Transportation Research Board of the National Academies, Washington, D.C., 2006, pp. 9 16. 85

Contact Information DrHelmut Schneider Director hschnei@lsu.edu (225) 578 2516 Mark Verret Network/Server Administrator mark@lsu.edu (225) 578 0283 Cory Hutchinson Associate Director cory@lsu.edu (225) 578 1433 Chas. Cavalier Graduate Student ccaval9@lsu.edu Max Kelly Graduate Student mkell21@lsu.edu