Predictive Analytics Mani Janakiram, PhD Director, Supply Chain Intelligence & Analytics, Intel Corp. Adjunct Professor of Supply Chain, ASU October 2017 "Prediction is very difficult, especially if it's about the future." --Nils Bohr, Nobel laureate in Physics
Training Objective Data Analytics is now becoming the game changer in the industry and it is much more relevant for supply chain functions. Predictive Analytics is an advanced Data Analytics that leverages historical data and combines it with forecasting models to predict future outcomes. It can be used across the entire gamut of supply chain such as Plan, Source, Make, Deliver and Return. In this session, we will focus on the what, where and how predictive analytics can be used. Some of the techniques covered in this session include: Regression, Time Series Forecasting, Basic Machine Learning Algorithms (Clustering, Cart) and Decision Trees. Through use cases, we will demonstrate application of predictive analytics in supply chain functions. 2
What if you were able to. Predict the outcome of your new product launch? Evaluate the impact of your marketing campaigns hourly and make inventory adjustments in real-time? Predict potential failure of your critical equipment and plan contingencies? Predict the market sentiments and buying behavior of your consumers in real-time? Predict the outcome of the US Presidential elections? 3
Data to Insights: Driving Industry 4.0 Cognitive Analytics 4 4
Supply Chain Analytics 5 5
Predictive Modeling Data History Correlated Variables (x1, x2, x3,, y) Build Model Y = f(x1,x2,x3 ) Predict Future Y Given Xs, calc Y Learn the model based on historical data Correlation is used to understand strength of linear relationship. Beware causation versus correlation Predictive techniques (y=f(x)) such as Regression, Time Series Forecasting, Machine learning, Decision Tree are used to build models Terms such as data mining and machine learning are often used 6
Predictive Model Algorithms (y=f(x)) Supervised Learning Unsupervised Learning Y = Numeric Y = Categorical No Y Regression or Regression Classification Clustering Classification Linear Regression Linear Discriminant Analysis Decision Trees Neural Networks Hierarchical Best Subsets Lasso Stepwise Ridge Logistic Regression Time Series Forecasting Random Forests Bayesian Networks Gradient Boosting Support Vector Machines K Nearest Neighbors K Means Principal Components Analysis = we will look at these today 7
Better Prediction Better Model! y Three models: Under fit: bias Over fit: variance Just right! x Balance accuracy of fit on training data and generalization of future or test data to get a better model! 8
Predictive Modeling Regression Analysis Time Series Forecasting Machine Learning & Decision Trees 9
Regression and Correlation Regression analysis is a tool for building statistical models that characterize relationships between a dependent variable and one or more independent variables, all of which are numerical. A regression model that involves a single independent variable is called simple regression. A regression model that involves several independent variables is called multiple regression. Correlation is a measure of a linear relationship between two variables, X and Y, and is measured by the (population) correlation coefficient. Correlation coefficients will range from 1 to +1. Scatter plot evaluates the relationship between two variables. 10
Linear Regression Find the line to minimize the squared error of the fit (least squares loss function) Y 5 4 3 2 1 Error of fit to this point This is called a residual Fit Y = β 0 + β 1 x (note model can extend to any number of x s) 1 2 3 4 5 x Predictive Model Y = 1 + 0.67x 11
Watch out for Nonsense Correlations! Beware lurking variables! Correlation does not mean Causation NOTE: images captured from the web 12
Simple Linear Regression ŷ= b0 + bx 1 Example: Keith s Auto Sales Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Based on a sample of 5 previous sales, determine whether number of cars sold is influenced by number of TV Ads. Number of TV Ads (x) 1 3 2 1 3 Number of Cars Sold (y) Σx = 10 Σy = 100 14 24 18 17 27 x = 2 y = 20 See Excel output 13
Simple Linear Regression Excel Output 14
Multiple Regression Model ^y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p Example: Programmer Salary Survey A software firm collected data for a sample of 20 computer programmers. A suggestion was made that regression analysis could be used to determine if salary was related to the years of experience and the score on the firm s programmer aptitude test. The years of experience, score on the aptitude test, and corresponding annual salary ($1000s) for a sample of 20 programmers is shown in the table. See Excel output Exper. Score Salary Exper. Score 4 7 1 5 8 10 0 1 6 6 78 100 86 82 86 84 75 80 83 91 24.0 43.0 23.7 34.3 35.8 38.0 22.2 23.1 30.0 33.0 9 2 10 5 6 8 4 6 3 3 88 73 75 81 74 87 79 94 70 89 Salary 38.0 26.6 36.2 31.6 29.0 34.0 30.1 33.9 28.2 30.0 15
Multiple Linear Regression Excel Output 16
Predictive Modeling Regression Analysis Time Series Forecasting Machine Learning & Decision Trees 17
Time Series Forecasting? Forecasting is an estimate of future performance based on past/current performance Qualitative and Quantitative technique can be used for forecasting Forecasting can be used to get insight into future performance for better planning and managing the future activities Forecasting examples: Revenue forecasting Product demand forecasting Equipment purchase Inventory planning, etc. 18
Forecasting Inputs Sources: Internal: Performance history Experience Methods: Qualitative: Judgment Intuition Expectations External: Economy Market Customers Competition Quantitative: Data Statistical analysis Models Factors: Promotions Price changes Seasonal changes Trends / cycles Customer tastes Gov t regulations Forecast model should be better than Naïve models (which use Average or last period values) Forecast! 19
Forecasting Methods Graphical Techniques Life Cycle Models Combination of qualitative and quantitative techniques can be used for better forecasting.. 20
Qualitative Models Field Sales Force Bottom-Up Aggregation of Salesmen s Forecasts (subjective feel). Surveys are also used very often to get a feel for what the future holds. Jury of Executives A forecast developed by combining the subjective opinions of the managers and executives who are most likely to have the best insights. We use it internally (called as Forecast Markets) for product forecast and transition. Users Expectations Forecast based on users inputs (mostly subjective) Delphi Method Similar to Jury of Jury of Executives but the process is recursive until consensus among the panel members is reached. 21
Quantitative Time Series Models Naïve Forecast Simple Rules Based upon Prior Experience Trend Linear, Exponential, S-Curve and Other Types of Pattern Projection Moving Average Un-weighted Linear Combination of Past Actual Values Filters Weighted Linear Combination of Past Actual Values Exponential Smoothing Forecasts Obtained by Smoothing and Averaging Past Actual Values Decomposition Time Series Broken Into Trend, Seasonality, Cyclicality, and Randomness Data mining Mostly CART and Neural Network techniques are used for forecasting 22
Causal/Explanatory Models Regression Models: Modeling the value of an output based on the values of one or more inputs. Also called causal models (can have multiple inputs). Econometric Models: Simultaneous Systems of Multiple Regression Equations Life Cycle Models: Diffusion-based approach, used to predict market size and duration based on product acceptance behavior ARIMA Models: Class of time series models that take into consideration, various behavior of the response in question based on past time series data. Transfer Function Models: Class of time series models that take into consideration, various behavior of the response along with causal factors based on past time series data. 23
Simple Moving Average (or UWMA) The Simple Moving Average, or Uniformly Weighted Moving Average (UWMA) for a time period t, with a range span of k, is described as: First 3 data points averaged SMA t k i= = 1 k x t i The forecasted value for time t+1 becomes: SMA k = i= 1 t+ 1 x ( t+ 1) i k 24
Exponentially Weighted Moving Average (EWMA) In contrast to Simple Moving Average, Exponentially Weighted Moving Average (EWMA) for a time period t, uses all of previously observed data, instead of just a fixed range. Smoothed value for a time period, t, indicated by Z t. Z Z t t = α y = Z t 1 t 1 + (1 α) Z + α ( y t 1 Z t 1 t 1 ) or Each previous observation receives different weight, with more recent observations being weighted more heavily, and weights decreasing exponentially. How these weights decrease determined by smoothing parameter, α. Normally it ranges from 0.2 to 0.4. When α approaches 0, all prior observations have equal weight When α approaches 1, most weight is given to more recent observations Smoothing method should be done on stationary time series without seasonality. 25
Forecasting Scenario* Process: Sets of 10 coin flips What s the naïve model? Can you beat it? If so, how? 26
Forecasting: Common Pitfalls Good-fitting model accurate forecast Standard modeling methods can be (ab)used to over-fit the data Forecast Fitting a model to history is easy good forecasting is not. (SAS Institute) 27
Example: Regression vs. Time Series Time series: Collection of observations made over equally spaced time intervals E.g.: cost, inventory levels, shipment times, weekly forecast values Time Series Example: Monthly CO 2 concentrations from Mauna Loa captured for 14 years Estimate CO 2 data for next 2 years Intuitively, we could do better than the simple linear regression, right? Bivariate Fit of CO2 By Year/Month 350 345 Forecast 360 356 352 CO2 340 335 330 325 01/1973 01/1974 01/1975 01/1976 01/1977 01/1978 01/1979 Simple linear regression model shows overall trend of data, but it is not a best representation of all patterns in data! 01/1980 01/1981 01/1982 Year/Month 01/1983 01/1984 01/1985 01/1986 01/1987 01/1988 01/1989 01/1990 Predicted Value 348 346 342 338 334 332 328 326 0 50 100 150 200 Row Linear Fit Time Series Models can yield better predictive models! It takes advantage of correlation (s) between the current value and other responses that occurred previously in time 28
Components of a Time Series The time series data can mainly have the following components: Level Trend Cyclical Seasonal Irregular 375000 370000 365000 360000 355000 350000 Dec Jan Feb Mar Apr May Jun Jul Level Aug Sep Oct Nov 1350000 1150000 950000 750000 550000 350000 150000 Dec Jan Feb Mar Apr May Jun Linear (Trend) Jul Aug Sep Oct Poly. (Trend) Nov 3 2.5 2 1.5 1 Dec Jan Feb Mar Apr May Jun Jul Aug 0.5 0 Seasonality Sep Oct Nov Level: Absence of trend, seasonality, or noise. Also, check for stability (Stationary vs. Nonstationary Trend: Continuing pattern of increase or decrease, either straight-line or curve Seasonal: A repeating pattern of increases and decreases that occurs with a regular frequency First plot your data chart to understand any behavior and relationships 29
Models to Account for Seasonality Think of creating model to account for Seasonality as extension of the exponential smoothing. There are 3 terms to estimate in a seasonal model, the Level, the Trend, and the Seasonal components. Seasonal Trend Level Forecast 360 356 Now look at this CO 2 data from this perspective! Predicted Value 352 348 346 342 338 334 332 328 326 0 50 100 150 200 Row 30
Sample Size Guidance 50-100 observations recommended. More is better. 1 to 2 years of weekly data. 4 to 8 years of monthly data. 12 to 24 years of quarterly data. 31
Predictive Modeling Regression Analysis Time Series Forecasting Machine Learning & Decision Trees 32
Machine Learning (ML)/Data Mining (DM) Set clear business goal Get the right data Pre-processing Presentation ML/DM follows the problem solving methodology Model/Result New data Mine the data Determine what is interesting Training data Testing data 33
Decision Trees (Classification) Finds the best factor split points to minimize the classification error Y = x2 Or 5 4 3 2 1 1 2 3 4 5 Model classification error x1 Y is a 2 level categorical variable < 3 Predictive Model If x1< 4.5 and x2 < 3 Then Y = Else Y = < 4.5 x2 x1 The model is a piecewise constant function 34
Iris data Clustering Analysis Iris data first published by geneticist and statistician Ronald Fisher in 1936 is widely used in machine learning for benchmarking. The sepal length, sepal width, petal length, and petal width are measured in mm on fifty Iris specimens from each of three species: 1. Iris Setosa 2. Iris Versicolor 3. Iris Virginica Machine learning goal is to classify the species based on the flower characteristics 35
Iris data Y: 1=Setosa, 2=Versicolor & 3=Virginica; X: Sepal Length, Sepal Width, Petal Length and Petal Width Scatterplot Matrix 7.5 6.5 5.5 Sepal length Species setosa versicolor virginica 4.5 4.5 4 3.5 3 2.5 2 7 Sepal w idth 5 3 Petal length 1 2.5 2 1.5 1 0.5 Petal w idth 4.5 5 5.5 6 6.5 7 7.5 8 2 2.5 3 3.5 4 4.5 1 2 3 4 5 6 7.5 1 1.5 2 2.5 36
Tree building Scatterplot Matrix 8.0 7.5 7.0 6.5 6.0 5.5 5.0 4.5 4.5 Sepal length Species setosa versicolor virginica 4.0 3.5 3.0 2.5 Sepal w idth 1 OPTIMAL 2.0 7 6 5 4 3 2 1 Petal length 2 2.5 2.0 2 1.5 1.0 0.5 1 Petal w idth 4.5 5.5 6.5 7.5 2.0 2.5 3.0 3.5 4.0 4.51 2 3 4 5 6 7.5 1.0 1.5 2.0 2.5 37
Tree building 1.00 0.75 virginica Species 0.50 versicolor 0.25 setosa 0.00 All Row s Split 1 1.00 0.75 virginica Species 0.50 versicolor 0.25 setosa 0.00 Petal w idth>=0.8 All Row s Petal w idth<0.8 1.00 0.75 virginica Species 0.50 versicolor 0.25 setosa 0.00 Petal w idth<1.8 Petal w idth>=1.8 Petal w idth>=0.8 All Row s Petal w idth<0.8 38
Single tree: Classification vs. regression Classification Categorical Response: Species Regression Continuous Response: Species Number Output: Classification counts, and Misclassification % Output: mean and variance for each node. 39
Gartner Magic Quadrant - Data Science 40
Business Analytics 41
Questions? 42