Ezgi AVCI TSE, Personnel and System Certification Center, TURKEY. Gülser KÖKSAL METU, Industrial Engineering Department, TURKEY

Size: px
Start display at page:

Download "Ezgi AVCI TSE, Personnel and System Certification Center, TURKEY. Gülser KÖKSAL METU, Industrial Engineering Department, TURKEY"

Transcription

1 A COMPARISON OF SOME ROBUST REGRESSION TECHNIQUES Ezgi AVCI TSE, Personnel and System Certification Center, TURKEY Gülser KÖKSAL METU, Industrial Engineering Department, TURKEY 54th EOQ Congress Izmir October 2010

2 Outline Definition and Purpose of Regression Output of the Regression Process Regression Process Flow Diagram Why alternative Regression Methods? Robustness Outliers Robust Regression Methods A simulation Study A Case Study Conclusions

3 Regression Investigates and models the relationship between the variables Application areas: o Engineering o Physical sciences o Life and Biological Sciences o Social Sciences

4 Purpose of Regression To create an equation or transfer function from the measurements of the system s inputs and outputs acquired during a passive or active experiment. The transfer function is then used for -sensitivity analysis -optimization for system performance -tolerancing the system s components

5 Regression Industrial applications: Quality Control and Improvement ex: ISO Standard; 8. item: Measurement, Analysis and Improvement Data Mining

6 Output of Regression An estimation of the relative strength of the effect of each factor on the response An equation that analytically relates the critical parameters to the critical responses An estimate of how much of the total variation seen in the data is explained by the equation

7 Regression Process Flow Diagram The system to be assessed. (INPUT) Select the environment in which the data will be collected. Select the inputs and outputs to be measured in a passive or active experiment. Select and qualify the measurement systems used to acquire the data. Run the system in the prescribed environment and acquire the data as the inputs vary.

8 Regression Process Flow Diagram Inspect the data for outliers and remove them if root cause justifies their removal. Postulate and build a functional relationship between the inputs and the output. Test the statistical adequacy of the functional relationship. Test the predictive ability of the functional relationship on the physical system. The transfer function that analytically relates the inputs to the outputs. (OUTPUT)

9 Why Alternative Regression Methods? It is not easy to satisfy the assumptions Normality Assumption Violation Outliers! Robust Regression

10 Ignoring Outliers The Challenger Accident: Thiokol engineers argued that if the O-rings were colder than 53 F (12 C), they did not have enough data to determine whether the joint would seal properly. The shuttle and external tank did not actually explode. Instead they rapidly disintegrated under tremendous aerodynamic forces, since the shuttle was slightly past Max Q", or maximum aerodynamic pressure.

11 Outliers Defn: The observation that appears to deviate markedly from the other members of the sample in which it occurs.

12 Data with Outlier

13 Data without Outlier

14 Two common ways to detect outliers 1. Regression Diagnostics: It is hard to detect the multiple outliers 2. Robust Regression: It is easy to detect the outliers by their large residuals

15 What to do with Outliers? Delete them? Ignore them? Give less weight to them?

16 Robust regression methods are a smooth transition between full acceptance and full rejection of an observation The best rejection procedures are not The best rejection procedures are not competitive against the best robust procedures.

17 Robust Regression Methods Least Absolute Value (LAV) Huber M method MM method Least Median Squares (LMS) Least Trimmed Squares (LTS) Multivariate Adaptive Regression Splines (MARS) Local Weighted Scatter Plot Smoothing (LOESS)

18 A Simulation Study: Simulation has been a commonly used tool to compare robust regression techniques. The seven robust regression methods are compared by some performance measures with respect to some scenarios. The results are discussed and the most promising robust methods are determined.

19 The Results of the Simulation Study The most promising methods: OLS HUBER-M LAV LTS These methods are compared on an industrial data set.

20 The Logic of LAD Least Absolute Deviation:

21 Description of the Data Set Our data is taken from a real life manufacturing process which includes the sub-processes core, molding, melting, casting, fettling and painting. The dependent variable is the percentage of defectives on a cylinder head. Missing values are eliminated by the proper methods. The basic data set includes 36 independent variables and 92 observations.

22 CONCLUSIONS For our real life data we see that there is no significant difference between the robust methods and the classical OLS method. We have explained this situation by complexity of the data and irrelevant variables. Moreover, even if the results of the OLS and the robust regression methods are the same; the model fitted by OLS is not valid because it is not applied with normality assumption satisfied. As a result, robust methods are the safest way to deal with outliers even if their performances are same with the classical methods since they do not have such strict assumptions.

23 THANK YOU