Simple Linear Regression

Similar documents
Chapter 5 Regression

STAT 350 (Spring 2016) Homework 12 Online 1

Regression Analysis I & II

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference

BUS105 Statistics. Tutor Marked Assignment. Total Marks: 45; Weightage: 15%

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Correlation and Simple. Linear Regression. Scenario. Defining Correlation

Continuous Improvement Toolkit

4.3 Nonparametric Tests cont...

Regression diagnostics

Monitoring Silvicultural Operations

Business Quantitative Analysis [QU1] Examination Blueprint

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014

AP Statistics Scope & Sequence

STAT Show all work.

Midterm Exam. Friday the 29th of October, 2010

Gasoline Consumption Analysis

Clovis Community College Class Assessment

Creative Commons Attribution-NonCommercial-Share Alike License

Final Exam Spring Bread-and-Butter Edition

CHAPTER 10 REGRESSION AND CORRELATION

Soci Statistics for Sociologists

Statistics: Data Analysis and Presentation. Fr Clinic II

y x where x age and y height r = 0.994

Know Your Data (Chapter 2)

GENOTYPE-ENVIRONMENT INTERACTIONS IN 1/ PROGENY TESTS OF BLACK CHERRY PLUS TREES

Semester 2, 2015/2016

Upper Canopy module Fixed Area Plot Summary and beyond!

MAS187/AEF258. University of Newcastle upon Tyne

SLASH PINE SITE PREPARATION STUDY RESULTS AT AGE 11. Plantation Management Research Cooperative. Warnell School of Forest Resources

Two Way ANOVA. Turkheimer PSYC 771. Page 1 Two-Way ANOVA

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney

ST7002 Optional Regression Project. Postgraduate Diploma in Statistics. Trinity College Dublin. Sarah Mechan. FAO: Prof.

Categorical Variables, Part 2

Quantification of Harm -advanced techniques- Mihail Busu, PhD Romanian Competition Council

EXPERIMENTAL INVESTIGATIONS ON FRICTION WELDING PROCESS FOR DISSIMILAR MATERIALS USING DESIGN OF EXPERIMENTS

Statistical Analysis. Chapter 26

Design of Experiments (DOE) Instructor: Thomas Oesterle

. *increase the memory or there will problems. set memory 40m (40960k)

Practice Final Exam STCC204

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data

STAT 2300: Unit 1 Learning Objectives Spring 2019

Business Statistics (BK/IBA) Tutorial 4 Exercises

Discussion Solution Mollusks and Litter Decomposition

Searching for Truth = Experiment

Chapter 10 Regression Analysis

Elementary tests. proc ttest; title3 'Two-sample t-test: Does consumption depend on Damper Type?'; class damper; var dampin dampout diff ;

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011

Performance and regression analysis of thermoelectric generator

Example Analysis with STATA

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling

Example Analysis with STATA

Using Excel s Analysis ToolPak Add-In

Categorical Predictors, Building Regression Models

Model Building Process Part 2: Factor Assumptions

Background for Case Study: Clifton Park Residential Real Estate

Multiple Imputation and Multiple Regression with SAS and IBM SPSS

Researchjournali s Journal of Mathematics

10.2 Correlation. Plotting paired data points leads to a scatterplot. Each data pair becomes one dot in the scatterplot.

SAARC Training Workshop Program Identification, Comparison and Scenario Based Application of Power Demand/ Load Forecasting Tools

Timing Production Runs

SPSS Guide Page 1 of 13

Week 10: Heteroskedasticity

FOR 274 Forest Measurements and Inventory. Written Take Home Exam

Variable Method Source

What is the general form of a regression equation? What is the difference between y and ŷ?

Notes on PS2

AP Statistics - Chapter 3 notes

Test Summary Report The Effects on Canola Placement Using a Seed Brake on an Air Seeder Hoe Drill For: Airguard Inc. Abbotsford, British Columbia

Chapter Six- Selecting the Best Innovation Model by Using Multiple Regression

SEES 503 SUSTAINABLE WATER RESOURCES. Floods. Instructor. Assist. Prof. Dr. Bertuğ Akıntuğ

STATISTICS PART Instructor: Dr. Samir Safi Name:

Statistics 201 Summary of Tools and Techniques

CHAPTER 5 RESULTS AND ANALYSIS

Untangling Correlated Predictors with Principle Components

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening

You can find the consultant s raw data here:

5 CHAPTER: DATA COLLECTION AND ANALYSIS

Computer Handout Two

Analyzing Ordinal Data With Linear Models

DSFA Spring Lecture 40. The End What Next

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved.

Week 11: Collinearity

Correlation between Carbon Steel Corrosion and Atmospheric Factors in Taiwan

CHAPTER 8 T Tests. A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test

The Multivariate Regression Model

Chapter 3. Table of Contents. Introduction. Empirical Methods for Demand Analysis

Environmental correlates of nearshore habitat distribution by the critically endangered Māui dolphin

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology

Lecture 2a: Model building I

Econometrics is: The estimation of relationships suggested by economic theory

R-SQUARED RESID. MEAN SQUARE (MSE) 1.885E+07 ADJUSTED R-SQUARED STANDARD ERROR OF ESTIMATE

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

Examples of Statistical Methods at CMMI Levels 4 and 5

Optimization by RSM of Reinforced Concrete Beam Process Parameters

Stat 301 Final Exam December 20, 2013

Econometric Forecasting in a Lost Profits Case

Lecture 9 - Sampling Distributions and the CLT

Transcription:

Simple Linear Regression Technical: lin regression sd(errors) Conf Ints Pred Ints Conceptual Normal dist response y parameters non-linear A what-if prob model for prediction A screening process A scientific tool for testing/thinking non-normal predictors x scales (inc logs) MINITAB A simple model of the data generating system Cert Stats: Intro to Regression Week 1 1

Simple Lin Reg Y = +x + Revision Examples Extensions/Variations Scales Additive / Multiplicative Multiple regression Concepts in Statistical Modelling Statistical Inference Why model? The purpose of a statistical model is not: to fit the data is rather: to refine the question Cert Stats: Intro to Regression Week 1 2

Statistical Models (Some aspect of) Y simply related to (Some aspect of) X (Some aspect of) in reported units; some other units; more general. Simply related to Small change in X (Same?) Small change in Y Change Additive; Multiplicative; other Apart from some (small) (unpredictable) (error) Which is and Why model? Additive; Multiplicative; other Of (same) (avg size) for all values of X Cert Stats: Intro to Regression Week 1 3

Using models Conceptual A summary of the data to communicate to others A what-if prob model for prediction A screening process A scientific tool for testing/thinking A simple model of the data generating system Summary / Descriptive Comparison Insight into process Cert Stats: Intro to Regression Week 1 4

Scales and Change Time Salary, price, volume Quality Temperature Earthquake magnitude, ph Approval/Rating Likelihood/proportion Before/after; with/without Treatments/Admin unit: 1,2,3,4 or A,B,C,D Fuel consumption Miles/gallon Miles/litre Litres/1 km!!!!!!!!!!! Keep global warming to 2C, equiv 35.6F!!!!!!!!!!! Cert Stats: Intro to Regression Week 1 5

Completions House Completions vs Time 14 12 1 Fitted Line Plot Completions = - 1527868 + 769.4 time S 926.658 R-Sq 87.9% R-Sq(adj) 87.6% House completions data series. Based on the number of new dwellings connected by ESB Networks to the electricity supply 8 6 Source: Department of the Environment, Heritage and Local Government 4 Objective: To summarise 2 199 1992 1994 1996 time 1998 2 22 Question? Cert Stats: Intro to Regression Week 1 6

Price Sing $ Diamonds Carat wt vs Price Fitted Line Plot Price Sing $ = - 2298 + 11599 Carat Data on Carat - Wt of stones in carat units Price (Singapore $) 15 1 Regression 95% CI 95% PI S 1117.56 R-Sq 89.3% R-Sq(adj) 89.2% 5 Objective Nominal: Predict Price from Wt Actual: Is relationship linear? Question? -5.2.3.4.5.6.7.8.9 1. 1.1 Carat Prediction bands 2s Pricing the C's of Diamond Stones Singfat Chu National University of Singapore Journal of Statistics Education Volume 9, Number 2 (21) See http://www.amstat.org/publications/jse/jse_data_archive.htm http://www.amstat.org/publications/jse/v9n2/datasets.chu.html Cert Stats: Intro to Regression Week 1 7

Volume Trees: Vol 8 6 4 2-2 6 65 Fitted Line Plot Volume = - 87.12 + 1.543 Height 7 75 Height 8 85 9 Regression 95% CI 95% PI S 13.397 R-Sq 35.8% R-Sq(adj) 33.6% Sample of 31 black cherry trees in the Allegheny National Forest, Pennsylvania, Y = volume (cubic feet), X 1 X 2 = height (feet) = diameter (inches) (at 54 inches above ground Objective: Use height as proxy for volume Later Use diam and ht combined measure as proxy for volume Objective: Prediction via Calibrated Model Question? Prediction bands 2s Cert Stats: Intro to Regression Week 1 8

Compensation M.Stuart CEO Compensation (US$) and Company Sales (US$m) (Forbes Magazine, May 1994) 12 1 8 6 Fitted Line Plot Compensation = 1437416 + 61.57 Sales Total comp Industry Sales Regression 28816 Financial 95% 52 PI ComputersComm 242 553 1 Insurance 3653 1238 ComputersComm 2195 221641 Financial 238 25 Entertainment 415 S 1367165 R-Sq 13.7% R-Sq(adj) 13.5% 4 2 2 4 6 Sales 8 1 12 Question? Cert Stats: Intro to Regression Week 1 9

Gas Gas Gas Comsumption vs Temp Period 1 7 Fitted Line Plot Gas = 6.854 -.3932 Temperature S.281334 R-Sq 94.4% R-Sq(adj) 94.1% Weekly gas consumption (in 1 cubic feet) and average outside temperature (in deg C) for two "heating seasons : 26 weeks before / 3 weeks after cavity-wall insulation was installed. Thermostat was set at 2 C throughout. 6 5 4 Objective: Test: Measureable effect of insulation? Question? 3 2 2 4 6 Temperature 8 1 Period 2 5 Fitted Line Plot Gas = 4.724 -.2779 Temperature S.354848 R-Sq 81.3% R-Sq(adj) 8.6% Compare Intercepts & Slopes Before/After 4 3 2 1 2 4 6 8 1 Temperature Cert Stats: Intro to Regression Week 1 1

Stat Stat Math Marks 1 8 6 4 Fitted Line Plot Stat = - 12.32 + 1.8 Alg Regression 95% CI 95% PI S 12.966 R-Sq 44.2% R-Sq(adj) 43.5% 88 students in each of 5 maths exams Objective: Nominal: Predict one mark from another Actual: Understand inter-relationships Question? 2-2 -4 1 2 3 4 5 6 7 8 Alg Prediction bands 2s 1 8 6 Fitted Line Plot Stat = 9.361 +.758 Anal Regression 95% CI 95% PI S 13.792 R-Sq 36.9% R-Sq(adj) 36.1% 4 2 Mardia, K.V., Kent, J.T. & Bibby, (1979). Multivariate analysis. London: Academic Press. 1 2 3 4 Anal Cert Stats: Intro to Regression Week 1 11 5 6 7

Galton s heights data N=178 pairs; Y=Offspring (inches) X=Mid-parent (inches) Slope =.514 Galton s Data Objective: Nominal: Predict off-spring ht Actual: Quantify heredity Question? slope 1 Reversion to the mediocre Regression to the mean Corr =.51 Cert Stats: Intro to Regression Week 1 12

Plotting Scales Logs Revision Statistical Inference Normal distribution Prediction and Confidence Intervals Statistical significance Cert Stats: Intro to Regression Week 1 13

C o m p l e t i o n s Plotting Revision Fitted Line Plot Completions = - 1527868 + 769.4 time Over-plot fitted line 14 12 S 926.658 R-Sq 87.9% R-Sq(adj) 87.6% Visually estimate residuals 1 8 95% prediction interval line 2s parallel lines 6 4 2 199 1992 1994 1996 time 1998 2 22 Cert Stats: Intro to Regression Week 1 14

Gas Straight Line: changing the scale Change Temp to Fahrenheit (C, F) (, 32) Freezing (C, F) (1, 212) Boiling Fitted Line Plot Gas = 6.854 -.3932 Temperature Exercise Change Gas to ft³ 1 ft³ = 28317 L 7 6 5 4 S.281334 R-Sq 94.4% R-Sq(adj) 94.1% 3 2 2 4 6 Temperature 8 1 Temp in F Cert Stats: Intro to Regression Week 1 15

Straight Line: change the equation Gas = 6.85 -.393 Temp C Give equations for use with Fahrenheit X axis Slope + 1 C reduction of.393 in Gas +1.8 F reduction of.393 in Gas +1 F reduction of.393/1.8 =.218 in Gas Intercept C Gas consumption = 6.85 32 F Gas cons = 6.85 Gas F Gas cons = 6.85 -.218(-32)=13.82 = 13.82-.218 Temp F See EXCEL Change.Scale y 6.85.393Temp C Cert Stats: Intro to Regression Week 1 16 6.85.393 Temp F 32 /1.8 6.85.393 32 /1.8.393 /1.8 13.82.218Temp F Temp F

Gas CenGas Alt parameterisation: centred y a bx y y bx x y y bx bx Centred Coeffs before centering a=6.85 b=-.393 Means Gas 4.75 Temp 5.35 X =? Fitted Line Plot Gas = 6.854 -.3932 Temperature Fitted Line Plot CenGas =. -.3932 CenTemp 7 6 S.281334 R-Sq 94.4% R-Sq(adj) 94.1% 3 2 S.281334 R-Sq 94.4% R-Sq(adj) 94.1% 5 4 3 2 2 4 6 Temperature 8 1 Temp Gas Cen Temp Cen Gas -.8 7.2-6.15 2.45 -.7 6.9-6.5 2.15.4 6.4-4.95 1.65 2.5 6. -2.85 1.25 1-1 -2-7.5-5. -2.5. CenTemp 2.5 5. Cert Stats: Intro to Regression Week 1 17

Gas StGas Alt parameterisation: standardised y a bx Standardised y y s x x x x x b s y s y sx sx sy sy y y x x s s Now: intercept ; slope x x Coeffs before standardising a=6.85 b=-.393 Means- Gas: 4.75 Temp:5.35 SDs 2.87 1.16 Fitted Line Plot Gas = 6.854 -.3932 Temperature Fitted Line Plot StGas = -. -.9715 StTemp 7 S.281334 R-Sq 94.4% R-Sq(adj) 94.1% 2 S.241936 R-Sq 94.4% R-Sq(adj) 94.1% 6 1 5 4 3 2 2 4 6 Temperature 8 1 Temp Gas StTemp StGas -.8 7.2-2.14 2.11-2 -.7 6.9-2.11 1.85-2.4 6.4-1.72 1.42-1 -1 StTemp 1 2 Scale free Cert Stats: Intro to Regression Week 1 18

Revision: scale Descriptive Statistics: Stat, Anal, Alg, Vect, Mech 8 4 Stat Matrix Plot of Stat, Anal, Alg, Vect, Mech 25 5 4 8 Variable Mean SE Mean StDev Stat 42.31 1.84 17.26 Anal 46.68 1.58 14.85 Alg 5.6 1.13 1.62 Vect 5.59 1.4 13.15 Mech 38.95 1.86 17.49 7 45 2 8 4 4 8 Anal 2 Alg 45 7 Vect Mech 4 5 25 8 4 8 Correlations: Stat, Anal, Alg, Vect, Mech Stat Anal Alg Vect Anal.67 Alg.665.711 Vect.436.485.61 Mech.389.49.547.553 Cell Contents: Pearson correlation Derive Equations: Stats vs Algebra Stat 42.31 Alg5.6.655 17.26 1.62 Stats vs Analysis Cert Stats: Intro to Regression Week 1 19

Scale: Additive or Multiplicative Y depends on x; some random variation involved Data (y 1, x 1 ), (y 2, x 2 ) Additive or Multiplicative changes? (y 2 - y 1 ) depends on (x 2 - x 1 ) (y 2 / y 1 ) depends on (x 2 / x 1 ) Multiplicative change eg Doubles 1% increase Log Transform renders Mult as Additive (log(y 2 ) - log(y 1 ) ) depends on (log(x 2 ) - log(x 1 ) ) Cert Stats: Intro to Regression Week 1 2

Price Sing $ Price Sing $ 15 1 5 Fitted Line Plot Price Sing $ = - 2298 + 11599 Carat Diamonds Regression 95% CI 95% PI S 1117.56 R-Sq 89.3% R-Sq(adj) 89.2% Minitab? Model details? Compare? Improve? Fitted Line Plot log1(price Sing $) = 3.964 + 1.537 log1(carat) -5.2.3.4.5.6.7.8 Carat Prediction bands 2s.9 1. 1.1 18 16 14 12 Regression 95% CI 95% PI S.731222 R-Sq 95.7% R-Sq(adj) 95.7% 1 8 6 4 2 Prediction bands?.2.3.4.5.6.7 Carat.8.9 1. 1.1 Cert Stats: Intro to Regression Week 1 21

Volume Log1Vol Volume Trees: Vol vs Height Fitted Line Plot Log1Vol = - 6.62 + 3.982 Log1Ht Fitted Line Plot log1(volume) = - 6.62 + 3.982 log1(height) 2.2 2. Regression 95% CI 95% PI 1 Regression 95% CI 95% PI 1.8 1.6 S.176926 R-Sq 42.1% R-Sq(adj) 4.1% S.176926 R-Sq 42.1% R-Sq(adj) 4.1% 1.4 1.2 1. 1.8.6 1.8 1.82 1.84 1.86 1.88 Log1Ht 1.9 1.92 1.94 6 65 7 75 Height 8 85 9 Fitted Line Plot log1(volume) = - 6.62 + 3.982 log1(height) 12 1 8 6 4 2 6 65 7 75 Height 8 85 9 Regression 95% CI 95% PI S.176926 R-Sq 42.1% R-Sq(adj) 4.1% Minitab? Model details? Compare? Improve? Cert Stats: Intro to Regression Week 1 22

Multiplicative Change Salary increases by 2% Salary Salary + 2% of Salary Salary + Salary.2 Salary 1.2 Salary decreases by 2% Salary Salary.8 Salary increases by 2% then deceases by 2% Salary (Salary 1.2).8 = Salary? Cert Stats: Intro to Regression Week 1 23

Recall: Logs and Percentages 1 log 1 log1 1 log 5.69897 log 5 log 5 1 log 5 log 1.69897 1 log 1 log 1 1 log 1 log 1 2 log1 1.69897 Antilog 2 1 1 Antilog 1.69897 1 1 1 5 5 Salary increases by 2% Salary Salary 1.2 log Salary log Salary log1.2= log Salary.79 Salary decreases by 2% Salary Salary.8 log Salary log Salary log.8= log Salary.97 Log Salary increases by.79 then decreases by.79 Salary unchanged by 2% by 16.6% Not symmetric by 1.2 1 by 1.2 by ie by.833 1.2 Cert Stats: Intro to Regression Week 1 24

Changing to/from Log scale Equation in log scale: log( y) a blog( x) ab log( x) a b log( x) a log( x) b a b x a log( x) b a b a/ b 1 1 1 1 1 Antilog( y) 1 1 1 Equation in lin. scale: y 1 1 1 y x OR x b y x OR Rescaled x b b Increase log( x) by1 Increase log( y) by b (linearly) Multiply x by1 b Increase y by factor of 1 (multiplicatively) b b y y changes to y 11 (linear change of 1 ) Cert Stats: Intro to Regression Week 1 25

Trees: Changing to/from Log scale Equation in log scale: log( Volume) 6.62 3.982log( Height) log( ) b 6.62 3.982 1 ( Height) Equation in lin. scale: Volume 1 1 1 Volume a x a b Volume.867 ( Height) x 3.982 6.62 3.982 3.982 (1 ) OR Volume Height (.3 Height) 3.982 Volume Height doubles (. 3 Height) 4 Refine the question? Cert Stats: Intro to Regression Week 1 26

CEO Compensation Which scale to use? What does this tell us about the nature of the relationship? Compensation Rescaled Sales 1/4 Sales Double Question? Cert Stats: Intro to Regression Week 1 27

Error Term in Log Scale Linear Scale Model y a bx y predicted y 2 Error Model, 95%of random error lie within 2 N s s Notation sometimes used y predicted y 2 s s small; y close to pred y Error band width constant Log Scale Model log y a blog x log y predictedlog y 2 s y 2 Linear Scale Plot y predicted y 11 Notation sometimes used y predicted 1 1 s s small; y close to pred y Error band y Multiplicative error Cert Stats: Intro to Regression Week 1 28

Error Term in Log Scale Not symmetric Cert Stats: Intro to Regression Week 1 29

Frequency FITS1 Revision Computing Scatterplot of FITS1 vs time Comps Time RESI1 FITS1 4296 199. 1116.75 3179.25 4477 199.25 115.41 3371.59 511 199.5 1447.7 3563.93 4752 199.75 995.72 3756.28 4692 1991. 743.38 3948.62 396 1991.25-234.96 414.96 4895 1991.5 561.7 4333.3 4979 1991.75 453.35 4525.65 4155 1992. -562.99 4717.99 563 1992.25 692.67 491.33 5886 1992.5 783.33 512.67 5338 1992.75 42.98 5295.2 3684 1993. -183.36 5487.36 4487 1993.25-1192.7 5679.7 589 1993.5-783.4 5872.4 Etc... 12 11 1 9 8 7 6 5 4 3 12 1 8 199 1992 1994 1996 time Histogram of RESI1 Normal 1998 2 22 Mean StDev 926.7 N 44 6 4 2-2 -1 1 2 RESI1 Cert Stats: Intro to Regression Week 1 3

Frequency FITS1 Revision Computing Scatterplot of FITS1 vs time Comps Time RESI1 FITS1 4296 199. 1116.75 3179.25 4477 199.25 115.41 3371.59 511 199.5 1447.7 3563.93 4752 199.75 995.72 3756.28 4692 1991. 743.38 3948.62 396 1991.25-234.96 414.96 4895 1991.5 561.7 4333.3 4979 1991.75 453.35 4525.65 4155 1992. -562.99 4717.99 563 1992.25 692.67 491.33 5886 1992.5 783.33 512.67 5338 1992.75 42.98 5295.2 3684 1993. -183.36 5487.36 4487 1993.25-1192.7 5679.7 589 1993.5-783.4 5872.4 Etc... 12 11 1 9 8 7 6 5 4 3 12 1 8 199 1992 1994 1996 time Histogram of RESI1 Normal 1998 2 22 Mean StDev 926.7 N 44 6 4 2-2 -1 1 2 RESI1 Cert Stats: Intro to Regression Week 1 31

Frequency 12 1 8 6 4 2 Histogram of RESI1 Normal Revision Normal Dist Mean StDev 926.7 N 44 Long run props 68% within ± 1 SD of mean 95% within ± 2 SD of mean -2-1 RESI1 1 2 99.7% within ± 3 SD of mean 44 resids Mean. SD 915.8 1116.8 43. -64.8-2.6 15 of 44 115.4-183.4 275.9 295.1 outside 1447.1-1192.7-1213.5 1651.8 ( 915.8) 995.7-783. -779.8-173.6 743.4-23.4-53.2-68.9 29 of 44 = -235. -1965.7 319.5 299.7 65.9% 561.7-1183.1-187.8 1398.4 453.4 193.6 5.8-571. -563. 362.2 241.5 524.7 692.7-1256.1 836.1 634.4 783.3-169.4-1324.2 1423. Cert Stats: Intro to Regression Week 1 32

Revision Normal Dist Sketch Normal Mean 25 SD 4 68% within ± 1 SD of mean 95% within ± 2 SD of mean 99.7% within ± 3 SD of mean Cert Stats: Intro to Regression Week 1 33

Stat Revision: SumSq 1 8 6 4 2 1 2 Fitted Line Plot Stat = 9.361 +.758 Anal 3 4 Anal 5 6 7 Regression 95% PI S 13.792 R-Sq 36.9% R-Sq(adj) 36.1% Descriptive Statistics: Sum of Variable Count Mean StDev Variance Squares Stat 88 42.31 17.26 297.76 183413. Anal 88 46.68 14.85 22.38 21942. RES 88 -. 13.71 187.98 16354.67 FITS 88 42.31 1.48 19.77 16758.33 CenStat 88 -. 17.26 297.76 2594.72 CenAnal 88 -. 14.85 22.38 19173.9 CenFit 88 -. 1.48 19.77 955.5 Regression Analysis: Stat versus Anal The regression equation is Stat = 9.361 +.758 Anal S = 13.792 R-Sq = 36.9% R-Sq(adj) = 36.1% Analysis of Variance Source DF SS MS F P Regression 1 955. 955.5 5.22. Error 86 16354.7 19.17 Total 87 2594.7 Stat Anal RESI1 FITS1 81 67 24.3534 56.6466 81 7 22.2362 58.7638 81 66 25.592 55.948 68 7 9.2362 58.7638........ Cert Stats: Intro to Regression Week 1 34

Revision: Covar,Corr and R 2 Regression Analysis: Stat versus Alg Stat = - 12.32 + 1.8 Alg S = 12.966 R-Sq = 44.2% Analysis of Variance Source DF SS MS Regression 1 11446.6 11446.6 Error 86 14458.1 168.1 Total 87 2594.7 Covariances: Stat Alg FITS1 RESI1 Stat 298 Alg 122 113 FITS1 132 122 132 RESI1 166 166 Covariances: Stat Alg FITS1 RESI1 Stat 297.755 Alg 121.871 112.886 FITS1 131.57 121.871 131.57 RESI1 166.185 -. -. 166.185 Correlations Stat Alg RESI1 Alg.665 RESI1.747 -. FITS1.665 1. -. Correlations Stat Alg FITS1 RESI1 Stat 1 Alg.66 1 FITS1.66 1. 1 RESI1.75. 1 Cert Stats: Intro to Regression Week 1 35

Revision: Corr, R 2, S Descriptive Statistics: Stat, Anal, Alg, Vect, Mech 8 4 Stat Matrix Plot of Stat, Anal, Alg, Vect, Mech 25 5 4 8 Variable Mean SE Mean StDev Stat 42.31 1.84 17.26 Anal 46.68 1.58 14.85 Alg 5.6 1.13 1.62 Vect 5.59 1.4 13.15 Mech 38.95 1.86 17.49 7 45 2 8 4 4 8 Anal 2 Alg 45 7 Vect Mech 4 5 25 8 4 8 Correlations: Stat, Anal, Alg, Vect, Mech Revision Stats vs Anal Stat Anal Alg Vect Anal.67 Alg.665.711 Vect.436.485.61 Mech.389.49.547.553 Derive R 2, S Cell Contents: Pearson correlation Cert Stats: Intro to Regression Week 1 36

Revision: Statistical Inference Point Estimates Best single values for intcpt/slope Conceptually Propose many values Compute SumSq of implied residuals Choose values with min SSQ Margin for Error Confidence Intervals Prediction intervals Statistical Testing Cert Stats: Intro to Regression Week 1 37

Are precision details important? Precision about what? Coefficients Gas = 5.49 -.29 Temp (-.29 ±?) Need to test null hypotheses? Is there evidence against insulation does not change Gas/Temp relationship? Cert Stats: Intro to Regression Week 1 38

Concept: Statistical Model data like this Replication Data Generating System Comp Generated Random Numbers y i ~ N, i 2 Cert Stats: Intro to Regression Week 1 39

Concept: Simple linear regression model y i ~ N, i 2 Data like this Line Data "like completions" High Low 14 12 1 8 6 4 2 1988 199 1992 1994 1996 1998 2 22 Pinch of salt Cert Stats: Intro to Regression Week 1 4

Concept: Sampling Distribution What results with data like this? Monte Carlo Experiments Sampling Distribution St Dev (Samp Dist) = St Error Alt Formulae methods Used by MINITAB Cert Stats: Intro to Regression Week 1 41

M.Stuart Concept: Statistical Inference Choose the best fitting model Treat it with scepticism How much? Rely on formulae based on the Normal dist Use random computer replication eg modern bootstrap methods Cert Stats: Intro to Regression Week 1 42

Science and Statistical Inference Assumptions: Well stated and focussed science Adequate and relevant data Adequate model for data like this Independent sources of info random variation Normal distribution Cert Stats: Intro to Regression Week 1 43

More than one predictor? Multiple Regression Statistics Marks: Four other Math marks Tree Volume: Tree Height and Diam at Chest Ht Diamond Price: Carat wt and Aspects of Quality Gas: Temp and Insulation Status Housing Comps: Seasonality Cert Stats: Intro to Regression Week 1 44