Describing DSTs Analytics techniques This document presents more detailed notes on the DST process and Analytics techniques 23/03/2015 1
SEAMS Copyright The contents of this document are subject to copyright and all rights are reserved. No part of this document may be reproduced, stored in a retrieval system or transmitted, in any form or by any means electronic, mechanical, recording or otherwise, without the prior written consent of the copyright owner. This document has been produced by SEAMS Ltd. SEAMS Ltd 2015 Created by: SEAMS Ltd. 2
Describing DSTs - Analytics The diagram in Figure 1 describes how the Decision Support Tools (DSTs)/Analytics are generally described, how they relate to each other and the levels of sophistication they represent. Figure 1 levels of sophistication of DSTs Descriptive analytics provides a view of what has happened or what is currently happening from looking at the data that is being produced. Descriptive analytics are used to understand how many incidents have happened or describe if an asset is in a particular state. Diagnostic analytics enable the drill down into data to identify the root cause of incidents or failures, or show why web sales may be down when there is lots of stock available. Descriptive and Diagnostic analytics are often treated as a single unit as they provide hindsight into what has happened and why. Predictive analytics uses statistical techniques to predict what a future outcome may be by analysing and using historical trends in data. It provides insight into the potential set of outcomes, often with a measure of uncertainty. It enables such as the prediction of the number of incidents tomorrow or next week, due to the weather or condition of a road. Prescriptive analytics provides foresight to decision makers as to what action to take to change and outcome, and a forecast of what the outcome may be. Prescriptive analytics is also known as modelling. Using business rules and solver heuristics/optimisation Prescriptive analytics helps answer the hard questions of where do I do maintenance to reduce incidents. 3
DST process This section describes a typical process used to populate and generate decisions from DSTs. Process In order to generate the decision support evidence using a DST a number of steps are required to process the data and information into insight. Assets/ ERP Behaviours Historic Failures Model Optimise Customer Contacts GIS Data Proactive Rules Proactive and Reactive Rules Scenarios Costs Source data Predictive Analytics Prescriptive Analytics Compare, Monitor, Act Figure 2 DST process map The diagram above (Figure 2) shows the generic process that is performed by DSTs to transform data into insight. This example is for the most common DST uses in an organisation where Asset deterioration and investment or maintenance is the main driver. Source Data is key to an effective outcome prediction in a DST. Most DSTs require information from a number of sources in order to enable them to balance different criteria for an optimal output. All data requests made to an organisation will be specific to the questions to be answered by the DST and the techniques the DTS employs in its analysis. The first dataset will be the assets. Typically the descriptive attributes that are statistically significant; location, age, material, size, load or usage characteristics. This information is typically static for the asset and found in the Asset Register. Historic information follows the analysis and is often in different systems that record failures, incidents, service performance or condition monitoring. In many cases these need to be cleaned and normalised into categories specific to the industry that asset is in. If the DST is only performing predictive analytics, to predict future condition, then this asset dataset will suffice. For Asset investment planning or Operational forecasting additional information is needed; o Reactive and proactive works rules; what happens to operational expenditure when capital investment is not available, at what condition will a capital treatment be used or an operational treatment, what condition is the asset returned to following a treatment? This information is not normally in a system and needs to be extracted from expert staff and engineers. 4
o Unit costs for reactive and proactive works; how much does it cost for an operational treatment and capital investments, what are the penalties for missing targets, how much does each type of resource cost per job. This information is typically found by analysing previous works (in the next step) or from detailed financial information. Often it is necessary to process the data into a useable format and to clean it or back fill for missing information that can be inferred. Any cleaning or filling should be carried back to the source systems so that the next data extract is better and operational data integrity is improved. Predictive analytics (descriptive and/or diagnostic analytics), the actions in this step will be highly dependent on the outputs required. The primary activity will be analysing the historic asset data with different algorithms and split multiple ways using the asset attribute information. This will provide sets of calibrated deterioration curves, or similar statistical model, which can be used to predict future condition. Depending on the nature of the historic information the analytic algorithms can provide calibrated curves for risk, cost, performance, service, sales, etc. To ensure that the predictions are accurate it is normal to segment the analysis data sets into a training set and one or more validation sets. The accuracy of the calibrated curves is then tested by applying it to the assets in the validation set with their condition at an earlier point in time. The calculation should then predict the measured condition values if the analysis is accurate. If it is not a variance measure can be derived and the analysis repeated or the curves carried forward with a known margin for uncertainty. Other techniques for uncertainty and validation are covered below. Once the analysis is completed the next step will be prescriptive analytics or a recombination of asset data with the calibrated curves. This latter option will provide a forecast for the future condition (or other parameter) if no other actions were to take place. If there are known activities then these can often be manually applied by changing the asset values, typically replacing an asset to as new condition in the next period, to gauge its impact on the forecast. To do more sophisticated scenario analysis then prescriptive analytics modelling & optimisation is required. Prescriptive Analytics is a combination of building a model and applying some optimisation approach. There are a great range of applications which provide these modelling features. Until recently this has been a programming task however it is now more typical that modelling applications provide simplified modelling tools and an expert modeller can define the model. Modelling or building a model entails importing the asset data, describing the calibrated curves and defining the investment rules & costs within some modelling application. The asset data is entered with only the attributes that are required for the analysis and links back to GIS or other system. This will be static attributes and the current value for condition, service, performance, risk, etc. This current/historic information is the starting value, per asset, for the predictive models. The calibrated curves are generally specified as a set of lookup tables which provide the same input values for the predictive models which are expressed as mathematical equations. These equations also reference the asset information in the attributes and the current/historic values which will be populated for each future time step being reported. 5
Treatments with unit costs and rules about impact on assets are defined either by selecting built in functions or defined with mathematical expressions. Different types of treatment may be entered for different asset categories, capex versus opex, or day work versus night work, for example. These treatment options become the decisions that can be made against the assets and are used by different optimisation algorithms. An Optimisation algorithm, see below, enables different scenarios to be explored with the model. An optimisation algorithm will provide the set of treatments for each of the models time steps that together will constitute the plan. By providing targets, such as budgets, and other constraints, such as roads can only be closed at weekends, the optimising algorithm can use the data and rules in the model to calculate the costs and benefits of the myriad options to identify the optimal set of treatments, assets and which time frames. Once all data and rules are encapsulated in a model it can be used to run multiple what if scenarios to provide insight into outcomes under different constraints. Results Visualisation - Compare, monitor, act prescriptive and predictive analysis generate lots of results information that typically needs to be interpreted in order to present it to the organisation in an easily understood form. 6
Analytic Techniques DSTs use a range of techniques to analyse data to provide predictions and supporting evidence. They are often highly specialised to specific types of data and decisions being made. The nature of their mathematical definition requires expert knowledge to apply them. This section provides an overview of the commonly used techniques split into Descriptive and Predictive analytics categories. Technique Category Description The use of GIS techniques to gain understanding of Spatial Analysis Descriptive the links between geocoded datasets. Regression Multivariate analysis Weibull Analysis CUSUM Analysis Economic analysis Optimisation Predictive Predictive Predictive Predictive Prescriptive Prescriptive A measure of the relationship between two variables. The simultaneous observation and analysis of multiple variables to understand the different relationships and their relevance. Lifetime distribution used to represent reliability of assets. Technique used to monitor the change detection in one or more datasets. The representation of asset behaviour by performance/incurred cost and what interventions can occur to meet set targets. The use of solver techniques to derive an intervention strategy to meet set constraints. This topic is covered in the Prescriptive Analytics, Modelling & Optimisation section. Figure 3 common analytic techniques This list is not exhaustive. Each technique is expanded upon below. 7
Spatial Analysis Spatial analysis is the use of geographical (GIS) techniques to gain understanding of the links between geo-coded datasets. This can involve a multitude of different techniques depending on the problem to be solved; this section contains two common examples. A Spider graph is a technique where each point is measured against numerous other points within a certain distance. This can then be used to obtain associations with surrounding data to associate incidents with specific assets. This technique is often required to overcome data collection issues. Cluster analysis (or hot spot analysis) is where groups of like events are grouped in order to identify spatial trends. This could be used to identify trends with the reporting of enquiries/complaints in relation to a customer survey score. Similar techniques to cluster analysis are used to correlate data generated by automatic rail or road scanning machines. These machines suffer from real world difficulties that impact on the alignment of successive data collection runs. Prior to any new run being used in an analysis it must be aligned with previous runs to ensure that defects, changes or improvements are not missed or multiply accounted for due to misalignment of successive runs. Figure 4 Spatial Analysis example of a spider graph Figure 5 Cluster Analysis example Regression and Multivariate Analysis Analysing and understanding the rate of which events occur is an essential component to the probability or asset deterioration element of a predictive model. When developing these models, a branching technique is used. This analysis has to be deep enough to understand the key parameters that influence the probability of the event occurring, but not so deep that the scale of the values become meaningless and unusable when making decisions. Figure 7 gives an example branching technique. In this pipe based example, a company failure rate will first be calculated, giving a number of failures per kilometre per year (nr/km/yr). A 2 nd level of analysis is then carried out to give material specific rates of failure. A 3 rd branch of analysis could then look at other explanatory variables such as diameter and age. 8
It is important to note that deeper down the branches, less data is available and therefore less chance of finding validated relationships. There is therefore a trade-off in model detail against the accuracy and validity of the analysis. Where there is insufficient data at a branch to deliver a statistically robust model, the model from the level above should be used. Figure 6 Example Branching of Probability Analysis Regression analysis uses actual data points and fits a curve to the data. The curve can take different shapes (linear/polynomial/exponential etc.) and how well it fits the observed data can be calculated by means of R-squared. Figure 7 gives an example output of an age based linear regression deterioration analysis. Multivariate analysis is the simultaneous observation and analysis of multiple variables to understand the different relationships and their relevance to the problem being studied. Figure 8 presents an example scatterplot of the results of multivariate analysis of four variables. Starting in the top corner and moving down, horsepower has a strong upward trend with weight, however a downward trend with acceleration, etc. This information is then complied to produce an overall relationship with a confidence of fit. Figure 7 Example of Linear Regression Figure 8 (left) Example Scatterplot of Four Variable Multivariate Analyses for common attributes of vehicles 9
Weibull Analysis Weibull modelling is a commonly used approach within reliability engineering and failure analysis. The deterioration curve takes an S -shape, which means a slow deterioration to start with, which then accelerates up to the point it approaches end of life. Weibull can be derived from actual data and/or expert judgement. An example is demonstrated in Figure 9, whereby the expected ages for 10%, 20% and 35% of assets that have failed have been Figure 9 - Example of the Weibull CDF given, and a Weibull curve has been fitted using these three data points. Weibull functions can be used to predict the probability of failure of an asset, or some service it provides. CUSUM Analysis CUSUM (Cumulative sum control chart) is a statistical technique typically used to monitor change detection in one or more datasets over a given time period. It can be used to model the how one cause dataset e.g. activities such as repairs, links to a consequence event such as customer complaints. The aim of this analysis is to determine a base load and severity factor for each cause event type by comparing the actual consequence events with the modelled number of complaints. This can be achieved with a statistical package optimiser to derive severity factors from consequence and cause records. Graph showing actual vs modelled weekly Severity of Cause Events (K Factors) complaints Values measuring model fitness (Sum, Variance, Standard Deviation and RMSE) Figure 10 Mobilisation Events Analysis Example In this example of cause event severity analysis presented in Figure 10, the optimiser determines severity factors which can best fit the model (red line) with actual records (dotted line) within the model fitness constraints (RMSE, sum, variance and standard deviation comparison). 10
Economic Analysis An economic analysis is the appraisal of different investment strategies over a defined time period based on a range of costs incurred and benefits that would be received. These appraisals should consider the full range of costs and benefits of the investment options in order to provide the most efficient solution. For example if economic analysis was to occur on assets the following would be considered: Investment Strategy full or partial Replacement and the associated CAPEX cost. Time Period The (regulatory) period required for investment e.g. 25 years. Costs Incurred OPEX costs from failures or other network activities. Benefits Change in risk of customer interruption based on investment. Following the definition of the economic problem, the creation of an investment strategy can be achieved by two objectives: - Least Whole Life Cost Decisions are made based on the lowest cost over the defined time period with an objective target set for: o The level of benefit that must be achieved. o What benefit can be achieved for a defined cost. - Cost Benefit Analysis If the benefits are valued in s decisions are made based on the point in which they are cost effective. Therefore in order to achieve economic analyses, the following should be considered: - Ability to handle various time periods. - Capture various costs, including direct, in-direct social and environmental. - Capture a range of benefits, with the option to monetise or not. - Allow evaluation of the costs and benefits based upon the outputs of the techniques described above. - Net present value calculations across the entire time period using a range of discount rates. - Cost/benefit analyses considering: o Consideration of different asset lives. o Different time to benefit realisation. 11