R Short Course Session 5

Size: px
Start display at page:

Download "R Short Course Session 5"

Transcription

1 R Short Course Session 5 Daniel Zhao, PhD Sixia Chen, PhD Department of Biostatistics and Epidemiology College of Public Health, OUHSC 11/20/2015

2 Outline Linear Regression Fit linear regression and check results Examine normality, independence and heteroscedasticity Examine outliers and influence points Examine collinearity Model selection procedure Model comparisons

3 Outline (2) Logistic Regression Fit logistic model and check results Odds ratio and confidence intervals Model selection procedure Goodness of fit test

4 Linear Regression Fitting linear regression of y vs x: lm(y~x,data=,weights=,singular.ok=true) Example: Input data is Prestige dataset in R package car Variables: education, income, women, prestige, census and type

5 Linear Regression (2) summary(prestige)

6 pairs(prestige) Linear Regression (3)

7 Fit linear regression reg1<lm(prestige~education+log2(income)+women, data=prestige)

8 summary(reg1) Check the results

9 attributes(reg1) Check the results (2)

10 Check the results (3) reg1$coefficients reg1$df.residual

11 Examine Normality QQ plot for studentized residuals: qqplot(reg1, main="qq Plot") Distribution of studentized residuals library(mass) sresid <- studres(reg1) hist(sresid, freq=false,main="distribution of Studentized Residuals") xfit<-seq(min(sresid),max(sresid),length=40) yfit<-dnorm(xfit) lines(xfit, yfit)

12 Examine Normality (2)

13 Examine Normality (3) shapiro.test(reg1$residuals) ad.test(reg1$residuals)

14 Plots: Examine Independence

15 Examine Independence (2) Durbin Watson Test: durbinwatsontest(reg1)

16 Plots: Examine heteroscedasticity

17 Examine heteroscedasticity (2) Goldfeld-Quandt test: gqtest(reg1). Note that we need to install R package lmtest Non-constant error variance test: ncvtest(reg1)

18 Examine Outliers outliertest(reg1) # Bonferonni p-value for most extreme obs lm.influence(reg1) # Calculate diagonal hat matrix and the influence of each point on regression coefficient and standard deviation estimation

19 Examine Outliers (2) lm.influence(reg1)$hat #calculate leverage lm.influence(reg1)$hat[lm.influence(reg1)$hat >3*(3+1)/102] #Identify high leverage cases

20 Examine influence points influence.measures(reg1) # calculate DFFITS, COOK S D, DFBETAS and Covariance ratios

21 Examine influence points (2) Cook s D plot (identify D values>4/(n-k-1)): cutoff<-4/((nrow(prestige)- length(reg1$coefficients)-2)) plot(reg1, which=4, cook.levels=cutoff) Influence Plot: influenceplot(reg1, id.method="identify", main="influence Plot", sub="circle size is proportial to Cook's Distance" )

22 Cook s D plot

23 Influence Plot

24 Examine Colinearity Variance inflation factors: vif(reg1) Problem? sqrt(vif(reg1)) > 2

25 Examine Colinearity (2) Added variable plots: avplots(reg1)

26 Model Selection Procedures step(object=,scope=list(lower=,upper=),directi on=c( both, backward, forward ), steps=1000,k=2, ) #object is an object representing a model #scope defines the range of models examined in the stepwise search #direction controls the mode of stepwise search

27 Model Selection Procedures (2) edu2<-education^2 loginc2<-log2(income)^2 edulogin<-education*log2(income) reg2<lm(prestige~education+edu2+log2(income)+lo ginc2+edulogin+women,data=prestige) step(reg2,direction='both')

28 Model Selection Procedures (3)

29 anova(reg1,reg2) Model Comparisons

30 Logistic Regression Input data: plasma in package HSAUR Variables: Fibrinogen: the fibrinogen level in the blood Globulin: the globulin level in the blood ESR: the erythrocyte sedimentation rate, either less or greater 20 mm /hour

31 Logistic Regression (2) plasma data: head(plasma) Objective: fit logistic regression by using ESR as dependent variable and other two as independent variables

32 Logistic Regression (3) glm(formula, data =,family=,weights=,intercept=, ) #formula is y~x type #data is the input dataset #family can be gussian, binomial or others #weights specifies weighted or unweighted analysis #intercept is logical (Do we need intercept or not?)

33 Fit logistic model fit<glm(esr~fibrinogen+globulin,data=plasma,fa mily=binomial('logit'))

34 summary(fit) Fit logistic model (2)

35 attributes(fit) Fit logistic model (3)

36 Logistic regression plot attach(plasma) ESR2<-rep(1,dim(plasma)[1]) ESR2[ESR=='ESR > 20']<-0 fit<glm(esr2~fibrinogen,data=plasma,family=binomi al('logit')) plot(fibrinogen, ESR2) lines(fibrinogen[order(fibrinogen)],fit$fitted.value s[order(fibrinogen)])

37 Logistic regression plot (2)

38 Odds ratio and confidence intervals Calculate Odds ratio: exp(coef(fit)) Calculate variance covariance for coefficients: vcov(fit)

39 Odds ratio and confidence intervals (2) Confidence interval for coefficient: confint.default(fit) Confidence interval for Odds ratio: exp(confint.default(fit))

40 Model selection fit2<glm(esr2~fibrinogen+globulin,data=plasma,fa mily=binomial('logit')) step(fit2)

41 Model selection (2)

42 Hosmer-Lemeshow goodness of fit test hoslem.test(x,y,g=10) in R package ResourceSelection #x is a numeric vector of observations, binary (0/1) #y is expected values #g is number of bins to use to calculate quantiles

43 Example hoslem.test(esr2,fit2$fitted.values)

44 Questions Contact and

AP Statistics Scope & Sequence

AP Statistics Scope & Sequence AP Statistics Scope & Sequence Grading Period Unit Title Learning Targets Throughout the School Year First Grading Period *Apply mathematics to problems in everyday life *Use a problem-solving model that

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Biometry 755 Spring 2009 Regression diagnostics p. 1/48 Introduction Every statistical method is developed based on assumptions. The validity of results derived from a given method

More information

Business Quantitative Analysis [QU1] Examination Blueprint

Business Quantitative Analysis [QU1] Examination Blueprint Business Quantitative Analysis [QU1] Examination Blueprint 2014-2015 Purpose The Business Quantitative Analysis [QU1] examination has been constructed using an examination blueprint. The blueprint, also

More information

Didacticiel Études de cas

Didacticiel Études de cas 1. Subject Detecting outliers and influential points for regression analysis. The analysis of outliers and influential points is an important step of the regression diagnostics. The goal is to detect (1)

More information

A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design

A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design Robert A. Vierkant, Terry M. Therneau, Jon L. Kosanke, James M. Naessens Mayo Clinic, Rochester, MN ABSTRACT A matched

More information

Biostatistics 208 Data Exploration

Biostatistics 208 Data Exploration Biostatistics 208 Data Exploration Dave Glidden Professor of Biostatistics Univ. of California, San Francisco January 8, 2008 http://www.biostat.ucsf.edu/biostat208 Organization Office hours by appointment

More information

Choosing the Right Type of Forecasting Model: Introduction Statistics, Econometrics, and Forecasting Concept of Forecast Accuracy: Compared to What?

Choosing the Right Type of Forecasting Model: Introduction Statistics, Econometrics, and Forecasting Concept of Forecast Accuracy: Compared to What? Choosing the Right Type of Forecasting Model: Statistics, Econometrics, and Forecasting Concept of Forecast Accuracy: Compared to What? Structural Shifts in Parameters Model Misspecification Missing, Smoothed,

More information

Surrogate Gaussian First Derivative Curves for Determination of Decision Levels and Confidence Intervals by Binary Logistic Regression

Surrogate Gaussian First Derivative Curves for Determination of Decision Levels and Confidence Intervals by Binary Logistic Regression Available online at www.annclinlabsci.org Annals of Clinical & Laboratory Science, vol. 39, no. 3, 2009 313 Surrogate Gaussian First Derivative Curves for Determination of Decision Levels and Confidence

More information

Mismanagement of Compostable and Recyclable Materials at Carleton College

Mismanagement of Compostable and Recyclable Materials at Carleton College Mismanagement of Compostable and Recyclable Materials at Carleton College Anthony Hill- Abercrombie and Zed Fashena Math 245: Applied Regression Final Project Abstract: To contribute to the campus wide

More information

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pages 37-64. The description of the problem can be found

More information

CREDIT RISK MODELLING Using SAS

CREDIT RISK MODELLING Using SAS Basic Modelling Concepts Advance Credit Risk Model Development Scorecard Model Development Credit Risk Regulatory Guidelines 70 HOURS Practical Learning Live Online Classroom Weekends DexLab Certified

More information

GETTING STARTED WITH PROC LOGISTIC

GETTING STARTED WITH PROC LOGISTIC PAPER 255-25 GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. USA Introduction Logistic Regression is an increasingly popular analytic tool. Used to predict the probability

More information

RESULT AND DISCUSSION

RESULT AND DISCUSSION 4 Figure 3 shows ROC curve. It plots the probability of false positive (1-specificity) against true positive (sensitivity). The area under the ROC curve (AUR), which ranges from to 1, provides measure

More information

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy AGENDA 1. Introduction 2. Use Cases 3. Popular Algorithms 4. Typical Approach 5. Case Study 2016 SAPIENT GLOBAL MARKETS

More information

Lab 1: A review of linear models

Lab 1: A review of linear models Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need

More information

GETTING STARTED WITH PROC LOGISTIC

GETTING STARTED WITH PROC LOGISTIC GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services and University of California, Berkeley Extension Division Introduction Logistic Regression is an increasingly popular analytic

More information

An empirical machine learning method for predicting potential fire control locations for pre-fire planning and operational fire management

An empirical machine learning method for predicting potential fire control locations for pre-fire planning and operational fire management International Journal of Wildland Fire 2017, 26, 587 597 IAWF 2017 Supplementary material An empirical machine learning method for predicting potential fire control locations for pre-fire planning and

More information

Tutorial Regression & correlation. Presented by Jessica Raterman Shannon Hodges

Tutorial Regression & correlation. Presented by Jessica Raterman Shannon Hodges + Tutorial Regression & correlation Presented by Jessica Raterman Shannon Hodges + Access & assess your data n Install and/or load the MASS package to access the dataset birthwt n Familiarize yourself

More information

Getting Started With PROC LOGISTIC

Getting Started With PROC LOGISTIC Getting Started With PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. 19229 Sonoma Hwy. PMB 264 Sonoma, California 95476 707 996 7380 SierraInfo@aol.com www.sierrainformation.com Getting

More information

Movie Success Prediction PROJECT REPORT. Rakesh Parappa U CS660

Movie Success Prediction PROJECT REPORT. Rakesh Parappa U CS660 Movie Success Prediction PROJECT REPORT Rakesh Parappa U01382090 CS660 Abstract The report entails analyzing different variables like movie budget, actor s Facebook likes, director s Facebook likes and

More information

Advanced Tutorials. SESUG '95 Proceedings GETTING STARTED WITH PROC LOGISTIC

Advanced Tutorials. SESUG '95 Proceedings GETTING STARTED WITH PROC LOGISTIC GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services and University of California, Berkeley Extension Division Introduction Logistic Regression is an increasingly popular analytic

More information

Using Predictive Margins to Make Clearer Explanations

Using Predictive Margins to Make Clearer Explanations Using to Make Clearer Explanations StataCorp LP Indian Stata Users Group Meeting 1 August 2013 Goals Introduction Goals Getting our Dataset This will be an interactive demonstration Looking at estimation

More information

Chapter 5 Regression

Chapter 5 Regression Chapter 5 Regression Topics to be covered in this chapter: Regression Fitted Line Plots Residual Plots Regression The scatterplot below shows that there is a linear relationship between the percent x of

More information

CHAPTER 5 RESULTS AND ANALYSIS

CHAPTER 5 RESULTS AND ANALYSIS CHAPTER 5 RESULTS AND ANALYSIS This chapter exhibits an extensive data analysis and the results of the statistical testing. Data analysis is done using factor analysis, regression analysis, reliability

More information

Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2014

Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2014 1 Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2014 Instructor: Joanne M. Garrett, PhD e-mail: joanne_garrett@med.unc.edu Class Notes: Copies of the class lecture slides

More information

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version Unit 5 Logistic Regression Homework #7 Practice Problems SOLUTIONS Stata version Before You Begin Download STATA data set illeetvilaine.dta from the course website page, ASSIGNMENTS (Homeworks and Exams)

More information

Clovis Community College Class Assessment

Clovis Community College Class Assessment Class: Math 110 College Algebra NMCCN: MATH 1113 Faculty: Hadea Hummeid 1. Students will graph functions: a. Sketch graphs of linear, higherhigher order polynomial, rational, absolute value, exponential,

More information

Correlations. Regression. Page 1. Correlations SQUAREFO BEDROOMS BATHS ASKINGPR

Correlations. Regression. Page 1. Correlations SQUAREFO BEDROOMS BATHS ASKINGPR multreg.sav squarefo bedrooms baths askingpr 3632 4 2.5 49 2 4889 6 5.0 399 3 3000 5 3.5 395 4 3669 4 3.5 379 5 2800 4 3.0 359 6 3600 5 3.5 349 7 2800 5 2.5 320 8 2257 3 3.0 299 9 2000 3 3.0 295 0 2455

More information

Soci Statistics for Sociologists

Soci Statistics for Sociologists University of North Carolina Chapel Hill Soci708-001 Statistics for Sociologists Fall 2009 Professor François Nielsen Stata Commands for Module 11 Multiple Regression For further information on any command

More information

Binary Classification Modeling Final Deliverable. Using Logistic Regression to Build Credit Scores. Dagny Taggart

Binary Classification Modeling Final Deliverable. Using Logistic Regression to Build Credit Scores. Dagny Taggart Binary Classification Modeling Final Deliverable Using Logistic Regression to Build Credit Scores Dagny Taggart Supervised by Jennifer Lewis Priestley, Ph.D. Kennesaw State University Submitted 4/24/2015

More information

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney Name: Intro to Statistics for the Social Sciences Lab Session: Spring, 2015, Dr. Suzanne Delaney CID Number: _ Homework #22 You have been hired as a statistical consultant by Donald who is a used car dealer

More information

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening r's age when 1st child born 2 4 6 Density.2.4.6.8 Density.5.1 Sociology 774: Regression Models for Categorical Data Instructor: Natasha Sarkisian Preliminary Data Screening A. Examining Univariate Normality

More information

4.3 Nonparametric Tests cont...

4.3 Nonparametric Tests cont... Class #14 Wednesday 2 March 2011 What did we cover last time? Hypothesis Testing Types Student s t-test - practical equations Effective degrees of freedom Parametric Tests Chi squared test Kolmogorov-Smirnov

More information

PROPENSITY SCORE MATCHING A PRACTICAL TUTORIAL

PROPENSITY SCORE MATCHING A PRACTICAL TUTORIAL PROPENSITY SCORE MATCHING A PRACTICAL TUTORIAL Cody Chiuzan, PhD Biostatistics, Epidemiology and Research Design (BERD) Lecture March 19, 2018 1 Outline Experimental vs Non-Experimental Study WHEN and

More information

Price transmission along the food supply chain

Price transmission along the food supply chain Price transmission along the food supply chain Table of Contents 1. Introduction... 2 2. General formulation of models... 3 2.1 Model 1: Price transmission along the food supply chain... 4 2.2 Model 2:

More information

Biostatistics 208. Lecture 1: Overview & Linear Regression Intro.

Biostatistics 208. Lecture 1: Overview & Linear Regression Intro. Biostatistics 208 Lecture 1: Overview & Linear Regression Intro. Steve Shiboski Division of Biostatistics, UCSF January 8, 2019 1 Organization Office hours by appointment (Mission Hall 2540) E-mail to

More information

BUS105 Statistics. Tutor Marked Assignment. Total Marks: 45; Weightage: 15%

BUS105 Statistics. Tutor Marked Assignment. Total Marks: 45; Weightage: 15% BUS105 Statistics Tutor Marked Assignment Total Marks: 45; Weightage: 15% Objectives a) Reinforcing your learning, at home and in class b) Identifying the topics that you have problems with so that your

More information

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference Unit 6: Simple Linear Regression Lecture 2: Outliers and inference Statistics 101 Thomas Leininger June 18, 2013 Types of outliers in linear regression Types of outliers How do(es) the outlier(s) influence

More information

POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM)

POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM) OUTLINE FOR THE POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM) Module Subject Topics Learning outcomes Delivered by Exploratory & Visualization Framework Exploratory Data Collection and

More information

Statistics: Data Analysis and Presentation. Fr Clinic II

Statistics: Data Analysis and Presentation. Fr Clinic II Statistics: Data Analysis and Presentation Fr Clinic II Overview Tables and Graphs Populations and Samples Mean, Median, and Standard Deviation Standard Error & 95% Confidence Interval (CI) Error Bars

More information

Timing Production Runs

Timing Production Runs Class 7 Categorical Factors with Two or More Levels 189 Timing Production Runs ProdTime.jmp An analysis has shown that the time required in minutes to complete a production run increases with the number

More information

Linear Regression Analysis of Gross Output Value of Farming, Forestry, Animal Husbandry and Fishery Industries

Linear Regression Analysis of Gross Output Value of Farming, Forestry, Animal Husbandry and Fishery Industries 1106 Proceedings of the 8th International Conference on Innovation & Management Linear Regression Analysis of Gross Output Value of Farming, Forestry, Animal Husbandry and Fishery Industries Liu Haime,

More information

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Modelling categorical variables using logit models Software commands

More information

Small Business advice seeking behaviour technical report. An analysis of the 2018 small business legal need survey July 2018

Small Business advice seeking behaviour technical report. An analysis of the 2018 small business legal need survey July 2018 Small Business advice seeking behaviour technical report An analysis of the 2018 small business legal need survey July 2018 Which characteristics of small businesses and the legal issues they face have

More information

Quantification of Harm -advanced techniques- Mihail Busu, PhD Romanian Competition Council

Quantification of Harm -advanced techniques- Mihail Busu, PhD Romanian Competition Council Quantification of Harm -advanced techniques- Mihail Busu, PhD Romanian Competition Council mihail.busu@competition.ro Summary: I. Comparison Methods 1. Interpolation Method 2. Seasonal Interpolation Method

More information

Case study: Modelling berry yield through GLMMs

Case study: Modelling berry yield through GLMMs Case study: Modelling berry yield through GLMMs Jari Miina Finnish Forest Research Institute (Metla) European NWFPs network Action FP1203 www.nwfps.eu TRAINING SCHOOL Modelling NWFP El Escorial, 29 th

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Univariate Statistics Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved Table of Contents PAGE Creating a Data File...3 1. Creating

More information

Checking the model. Linearity. Normality. Constant variance. Influential points. Covariate overlap

Checking the model. Linearity. Normality. Constant variance. Influential points. Covariate overlap Checking the model Linearity Normality Constant variance Influential points Covariate overlap 1 Checking the model: linearity Average value of outcome initially assumed to be linear function of continuous

More information

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data Intro to Statistics for the Social Sciences Fall, 2017, Dr. Suzanne Delaney Extra Credit Assignment Instructions: You have been hired as a statistical consultant by Donald who is a used car dealer to help

More information

Correlation between Carbon Steel Corrosion and Atmospheric Factors in Taiwan

Correlation between Carbon Steel Corrosion and Atmospheric Factors in Taiwan CORROSION SCIENCE AND TECHNOLOGY, Vol.17, No.2(2018), pp.37~44 pissn: 1598-6462 / eissn: 2288-6524 [Research Paper] DOI: https://doi.org/10.14773/cst.2018.17.2.37 Correlation between Carbon Steel Corrosion

More information

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway.

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway. Statistical Modelling for Business and Management J.E. Cairnes School of Business & Economics National University of Ireland Galway June 28 30, 2010 Graeme Hutcheson, University of Manchester Luiz Moutinho,

More information

SAARC Training Workshop Program Identification, Comparison and Scenario Based Application of Power Demand/ Load Forecasting Tools

SAARC Training Workshop Program Identification, Comparison and Scenario Based Application of Power Demand/ Load Forecasting Tools SAARC Training Workshop Program Identification, Comparison and Scenario Based Application of Power Demand/ Load Forecasting Tools Long Term Power Demand Forecasting using Regression Model Contents Growth

More information

FOLLOW-UP NOTE ON MARKET STATE MODELS

FOLLOW-UP NOTE ON MARKET STATE MODELS FOLLOW-UP NOTE ON MARKET STATE MODELS In an earlier note I outlined some of the available techniques used for modeling market states. The following is an illustration of how these techniques can be applied

More information

Winsor Approach in Regression Analysis. with Outlier

Winsor Approach in Regression Analysis. with Outlier Applied Mathematical Sciences, Vol. 11, 2017, no. 41, 2031-2046 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2017.76214 Winsor Approach in Regression Analysis with Outlier Murih Pusparum Qasa

More information

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING INTRODUCTION JMP software provides introductory statistics in a package designed to let students visually explore data in an interactive way with

More information

Quantitative Analysis Using Statistics for Forecasting and Validity Testing. Course #6300/QAS6300 Course Material

Quantitative Analysis Using Statistics for Forecasting and Validity Testing. Course #6300/QAS6300 Course Material Quantitative Analysis Using Statistics for Forecasting and Validity Testing Course #6300/QAS6300 Course Material Table of Contents Page Chapter 1: Decision Making With Statistics and Forecasting I. How

More information

Business Statistics: 41000

Business Statistics: 41000 Business Statistics: 41000 Section 4: Multiple Regression and Logistic Regression Nick Polson The University of Chicago Booth School of Business http://faculty.chicagobooth.edu/nicholas.polson/teaching/41000/

More information

METHOD VALIDATION TECHNIQUES PREPARED FOR ENAO ASSESSOR CALIBRATION COURSE OCTOBER/NOVEMBER 2012

METHOD VALIDATION TECHNIQUES PREPARED FOR ENAO ASSESSOR CALIBRATION COURSE OCTOBER/NOVEMBER 2012 METHOD VALIDATION PREPARED FOR ENAO ASSESSOR CALIBRATION COURSE TECHNIQUES OCTOBER/NOVEMBER 2012 Prepared by for ENAO Assessor Calibration B SCOPE Introduction House Rules Central Tendency Statistics Population

More information

Data from a dataset of air pollution in US cities. Seven variables were recorded for 41 cities:

Data from a dataset of air pollution in US cities. Seven variables were recorded for 41 cities: Master of Supply Chain, Transport and Mobility - Data Analysis on Transport and Logistics - Course 16-17 - Partial Exam Lecturer: Lidia Montero November, 10th 2016 Problem 1: All questions account for

More information

Building the In-Demand Skills for Analytics and Data Science Course Outline

Building the In-Demand Skills for Analytics and Data Science Course Outline Day 1 Module 1 - Predictive Analytics Concepts What and Why of Predictive Analytics o Predictive Analytics Defined o Business Value of Predictive Analytics The Foundation for Predictive Analytics o Statistical

More information

(DMSTT 21) M.Sc. (Final) Final Year DEGREE EXAMINATION, DEC Statistics. Time : 03 Hours Maximum Marks : 100

(DMSTT 21) M.Sc. (Final) Final Year DEGREE EXAMINATION, DEC Statistics. Time : 03 Hours Maximum Marks : 100 (DMSTT 21) M.Sc. (Final) Final Year DEGREE EXAMINATION, DEC. - 2012 Statistics Paper - I : STATISTICAL QUALITY CONTROL Time : 03 Hours Maximum Marks : 100 Answer any Five questions All questions carry

More information

Module 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset.

Module 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset. Module 7: Multilevel Models for Binary Responses Most of the sections within this module have online quizzes for you to test your understanding. To find the quizzes: Pre-requisites Modules 1-6 Contents

More information

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use "k:mydirectory,

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use k:mydirectory, Table XTMIXED Procedure in STATA with Output Systolic Blood Pressure, 2001. use "k:mydirectory,. xtmixed sbp nage20 nage30 nage40 nage50 nage70 nage80 nage90 winter male dept2 edu_bachelor median_household_income

More information

STEPHEN CARSTENS RCBM (Pty) Ltd ABSTRACT

STEPHEN CARSTENS RCBM (Pty) Ltd ABSTRACT INCREASING THE COMPETITIVENESS OF MAINTENANCE CONTRACT RATES BY USING AN ALTERNATIVE METHODOLOGY FOR THE CALCULATION OF AVERAGE VEHICLE MAINTENANCE COSTS STEPHEN CARSTENS stephcar@global.co.za RCBM (Pty)

More information

DETECTING AND MEASURING SHIFTS IN THE DEMAND FOR DIRECT MAIL

DETECTING AND MEASURING SHIFTS IN THE DEMAND FOR DIRECT MAIL Chapter 3 DETECTING AND MEASURING SHIFTS IN THE DEMAND FOR DIRECT MAIL 3.1. Introduction This chapter evaluates the forecast accuracy of a structural econometric demand model for direct mail in Canada.

More information

Applied Logistic Regression

Applied Logistic Regression Applied Logistic Regression Applied Logistic Regression Third Edition DAVID W. HOSMER, JR. Professor of Biostatistics (Emeritus) Division of Biostatistics and Epidemiology Department of Public Health

More information

ROBUST REGRESSION PROCEDURES TO HANDLE OUTLIERS. PRESENTATION FOR EDU7312 SPRING 2013 Elizabeth Howell Southern Methodist University

ROBUST REGRESSION PROCEDURES TO HANDLE OUTLIERS. PRESENTATION FOR EDU7312 SPRING 2013 Elizabeth Howell Southern Methodist University ROBUST REGRESSION PROCEDURES TO HANDLE OUTLIERS PRESENTATION FOR EDU7312 SPRING 2013 Elizabeth Howell Southern Methodist University ehowell@smu.edu Ordinary Least Squares (OLS) tertertertertertetert Simplest,

More information

Economic Analysis of Korea Green Building Certification System in the Capital Area Using House-Values Index

Economic Analysis of Korea Green Building Certification System in the Capital Area Using House-Values Index Economic Analysis of Korea Green Building Certification System in the Capital Area Using House-Values Index Kiyoung Son 1, Sungho Lee 2, Chaeyeon Lim 3 and Sun-Kuk Kim* 4 1 Assistant Professor, School

More information

ST7002 Optional Regression Project. Postgraduate Diploma in Statistics. Trinity College Dublin. Sarah Mechan. FAO: Prof.

ST7002 Optional Regression Project. Postgraduate Diploma in Statistics. Trinity College Dublin. Sarah Mechan. FAO: Prof. ST72 Optional Regression Project Postgraduate Diploma in Statistics Trinity College Dublin Sarah Mechan FAO: Prof. John Haslett School of Computer Science & Statistics Section 1 Introduction This report

More information

Add Sophisticated Analytics to Your Repertoire with Data Mining, Advanced Analytics and R

Add Sophisticated Analytics to Your Repertoire with Data Mining, Advanced Analytics and R Add Sophisticated Analytics to Your Repertoire with Data Mining, Advanced Analytics and R Why Advanced Analytics Companies that inject big data and analytics into their operations show productivity rates

More information

C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING. CAS Annual Meeting November 7-10

C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING. CAS Annual Meeting November 7-10 1 C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING CAS Annual Meeting November 7-10 GLM Process 2 Data Prep Model Form Validation Reduction Simplification Interactions GLM Process 3 Opportunities

More information

Short-Term Load Forecasting Under Dynamic Pricing

Short-Term Load Forecasting Under Dynamic Pricing Short-Term Load Forecasting Under Dynamic Pricing Yu Xian Lim, Jonah Tang, De Wei Koh Abstract Short-term load forecasting of electrical load demand has become essential for power planning and operation,

More information

Interval Matrix Eigen/Singular-Value Decomposition and an Application

Interval Matrix Eigen/Singular-Value Decomposition and an Application Interval Matrix Eigen/Singular-Value Decomposition and an Application CHENYI HU Professor and Chairman Computer Science Department University of Central Arkansas, USA URL: www.cs.uca.edu 1 RANMEP 2008,Taiwan

More information

Overview. Presenter: Bill Cheney. Audience: Clinical Laboratory Professionals. Field Guide To Statistics for Blood Bankers

Overview. Presenter: Bill Cheney. Audience: Clinical Laboratory Professionals. Field Guide To Statistics for Blood Bankers Field Guide To Statistics for Blood Bankers A Basic Lesson in Understanding Data and P.A.C.E. Program: 605-022-09 Presenter: Bill Cheney Audience: Clinical Laboratory Professionals Overview Statistics

More information

Engineering Statistics ECIV 2305 Chapter 8 Inferences on a Population Mean. Section 8.1. Confidence Intervals

Engineering Statistics ECIV 2305 Chapter 8 Inferences on a Population Mean. Section 8.1. Confidence Intervals Engineering Statistics ECIV 2305 Chapter 8 Inferences on a Population Mean Section 8.1 Confidence Intervals Parameter vs. Statistic A parameter is a property of a population or a probability distribution

More information

SAS Enterprise Miner 5.3 for Desktop

SAS Enterprise Miner 5.3 for Desktop Fact Sheet SAS Enterprise Miner 5.3 for Desktop A fast, powerful data mining workbench delivered to your desktop What does SAS Enterprise Miner for Desktop do? SAS Enterprise Miner for Desktop is a complete

More information

Math227 Sample Final 3

Math227 Sample Final 3 Math227 Sample Final 3 You may use TI calculator for this test. However, you must show all details for hypothesis testing. For confidence interval, you must show the critical value and the margin of error.

More information

To Hydrate or Chlorinate: A Regression Analysis of the Levels od Chlorine in the Public Water Supply

To Hydrate or Chlorinate: A Regression Analysis of the Levels od Chlorine in the Public Water Supply A Regression Analysis of the Levels od Chlorine in the Public Water Supply SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in

More information

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17 Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2 B. Rosner, 5/09/17 1 Outline 1. Testing for effect modification in logistic regression analyses 2. Conditional logistic

More information

Statistics and Data Analysis

Statistics and Data Analysis Selecting the Appropriate Outlier Treatment for Common Industry Applications Kunal Tiwari Krishna Mehta Nitin Jain Ramandeep Tiwari Gaurav Kanda Inductis Inc. 571 Central Avenue #105 New Providence, NJ

More information

Tabulate and plot measures of association after restricted cubic spline models

Tabulate and plot measures of association after restricted cubic spline models Tabulate and plot measures of association after restricted cubic spline models Nicola Orsini Institute of Environmental Medicine Karolinska Institutet 3 rd Nordic and Baltic countries Stata Users Group

More information

Logistic Regression for Early Warning of Economic Failure of Construction Equipment

Logistic Regression for Early Warning of Economic Failure of Construction Equipment Logistic Regression for Early Warning of Economic Failure of Construction Equipment John Hildreth, PhD and Savannah Dewitt University of North Carolina at Charlotte Charlotte, North Carolina Equipment

More information

Use Multi-Stage Model to Target the Most Valuable Customers

Use Multi-Stage Model to Target the Most Valuable Customers ABSTRACT MWSUG 2016 - Paper AA21 Use Multi-Stage Model to Target the Most Valuable Customers Chao Xu, Alliance Data Systems, Columbus, OH Jing Ren, Alliance Data Systems, Columbus, OH Hongying Yang, Alliance

More information

Background for Case Study: Clifton Park Residential Real Estate

Background for Case Study: Clifton Park Residential Real Estate Techniques for Engaging Business Students in the Statistics Classroom Jane E. Oppenlander Example Assignments and Class Exercises Background for Case Study: Clifton Park Residential Real Estate Data on

More information

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Data Set for this Assignment: Download from the course website: Stata Users: framingham_1000.dta Source: Levy (1999) National

More information

EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS

EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES by MICHAEL MCCANTS B.A., WINONA STATE UNIVERSITY, 2007 B.S., WINONA STATE UNIVERSITY, 2008 A THESIS submitted in partial

More information

MULTILOG Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): DARE.SSD. Example. Solution

MULTILOG Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): DARE.SSD. Example. Solution MULTILOG Example #1 SUDAAN Statements and Results Illustrated Logistic regression modeling R and SEMETHOD options CONDMARG ADJRR option CATLEVEL Input Data Set(s): DARESSD Example Evaluate the effect of

More information

STAT 350 (Spring 2016) Homework 12 Online 1

STAT 350 (Spring 2016) Homework 12 Online 1 STAT 350 (Spring 2016) Homework 12 Online 1 1. In simple linear regression, both the t and F tests can be used as model utility tests. 2. The sample correlation coefficient is a measure of the strength

More information

Going Further with SPSS 16. Jean Russell Bob Booth May 2010 AP-SPSS6

Going Further with SPSS 16. Jean Russell Bob Booth May 2010 AP-SPSS6 Going Further with SPSS 16. Jean Russell Bob Booth May 2010 AP-SPSS6 University of Sheffield Contents 1. INTRODUCTION... 3 1.1 MORE ON VARIABLES AND ANALYSIS... 3 2. STARTING SPSS... 5 2.1 SAVING AND LOADING

More information

Categorical Predictors, Building Regression Models

Categorical Predictors, Building Regression Models Fall Semester, 2001 Statistics 621 Lecture 9 Robert Stine 1 Categorical Predictors, Building Regression Models Preliminaries Supplemental notes on main Stat 621 web page Steps in building a regression

More information

Developing ISTA Cold Chain Environmental Standards

Developing ISTA Cold Chain Environmental Standards FRIDAY morning session Developing ISTA Cold Chain Environmental Standards Industry approved testing profiles have not been developed for the Cold Chain transportation environment. This presentation will

More information

Distinguish between different types of numerical data and different data collection processes.

Distinguish between different types of numerical data and different data collection processes. Level: Diploma in Business Learning Outcomes 1.1 1.3 Distinguish between different types of numerical data and different data collection processes. Introduce the course by defining statistics and explaining

More information

Categorical Data Analysis

Categorical Data Analysis Categorical Data Analysis Hsueh-Sheng Wu Center for Family and Demographic Research October 4, 200 Outline What are categorical variables? When do we need categorical data analysis? Some methods for categorical

More information

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H Schield-Logistic-OLS1D-Excel2013-Slides.pdf. Background & Goals

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H Schield-Logistic-OLS1D-Excel2013-Slides.pdf. Background & Goals Logistic Regression using OLS1D in Excel 2013 XL4D: V0H 1 Logistic Regression using OLS1D in Excel 2013 by Milo Schield Member: International Statistical Institute US Rep: International Statistical Literacy

More information

Improving long run model performance using Deviance statistics. Matt Goward August 2011

Improving long run model performance using Deviance statistics. Matt Goward August 2011 Improving long run model performance using Deviance statistics Matt Goward August 011 Objective of Presentation Why model stability is important Financial institutions are interested in long run model

More information

Information Literacy Program

Information Literacy Program Information Literacy Program SPSS Advanced Significance Testing 2017 ANU Library anulib.anu.edu.au/research-learn ilp@anu.edu.au Table of Contents To start SPSS... 1 Significance testing (Inferential

More information

Introduction to Generalized Linear Models: Nominal and Ordinal Logistic Regression, and Poisson Regression

Introduction to Generalized Linear Models: Nominal and Ordinal Logistic Regression, and Poisson Regression 1/39 to Generalized Linear Models: Nominal and Ordinal Logistic Regression, and Poisson Regression Dr Cameron Hurst cphurst@gmail.com DAMASAC and CEU, Khon Kaen University 24 th August, 2558 2/39 What

More information

Quantitative Methods

Quantitative Methods THE ASSOCIATION OF BUSINESS EXECUTIVES DIPLOMA PART 2 QM Quantitative Methods afternoon 4 June 2003 1 Time allowed: 3 hours. 2 Answer any FOUR questions. 3 All questions carry 25 marks. Marks for subdivisions

More information

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction Paper SAS1774-2015 Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction ABSTRACT Xiangxiang Meng, Wayne Thompson, and Jennifer Ames, SAS Institute Inc. Predictions, including regressions

More information

Leveraging Attitudinal & Behavioral Data to Better Understand Global & Local Trends in Customer Loyalty & Retention

Leveraging Attitudinal & Behavioral Data to Better Understand Global & Local Trends in Customer Loyalty & Retention Leveraging Attitudinal & Behavioral Data to Better Understand Global & Local Trends in Customer Loyalty & Retention Brian Griner, Ph.D. Science, Strategy & Technology for Relationship Management & Marketing

More information