+ Tutorial Regression & correlation Presented by Jessica Raterman Shannon Hodges
+ Access & assess your data n Install and/or load the MASS package to access the dataset birthwt n Familiarize yourself with the data Structure? i.e. Type of data? Number of observations? Parametric or nonparametric? Number & names of columns? Are you working with complete or incomplete data?
+ Access & assess your data n Explore the variables & put them in a more meaningful context What does the variable lwt measure? What type of variable? Look through the rest of the variables hypothesizing yet? n Produce simple summary statistics anything noteworthy, or is more information needed? Optional - rename the data for easier coding
+ Access & assess your data n Now that you have a better handle on what the data are, start reasoning: Generate a few scatterplots You can look at all pairs or just those of interest, if you have some ideas about what variables might be interesting Any relationships? Do a quick test of your preliminary suspicions by asking for the correlation between two variables of interest
+ Access & assess your data n Decide on two variables to use for the tutorial (practice s sake- don t spend too much time on this!) Again, use a help function to remind yourself what s being measured or to start reasoning through what might be related
+ Access & assess your data Check normality & distribution Visual assessment Do your variables follow a normal distribution? Leverage points/outliers? Consider transformations if necessary Don t forget to note/deal with missing values in your own datasets After changes: n Visualize again. Recheck the distribution n Has the correlation changed? n Has the scatterplot changed? n Think through what these changes mean
+ Parametric: Linear Regression n Regress y on x n Check the model s summary What values are of interest? Check model assumptions How much variation does your model explain? How much and in what direction does y change for each unit of x (i.e. explain the slope)? Put together the predictive equation
+ Parametric: Linear Regression n Confidence and Prediction Confidence intervals for all parameters Check B0, B1 CI for mean response What y interval values do we expect given x? Single predicted values of mean response What about single values of y for a given x?
+ Parametric: Linear Regression n Add the line of best fit to visually assess how well your data fits. Remember you need to rerun your plot if you ve closed it. n Find the regression equation y = B 0 +/- B 1 x Use the summary to get these values, can plug in numbers and predict values this way, too.
+ Nonparametric Use when there is residuals are not normally distributed (i.e. cannot assume linear relationship between x and y). n Correlation You will first need to change your coefficient of correlation to a suitable nonparametric method (e.g. Spearman). Check the help file.
+ Nonparametric n Smooth with loess, then use linear reg. Check residuals again with summary. Improved? Does it meet the requirements for linear regression now?
+ Further practice n Try one run-through of the tutorial with a new set of data that meet parametric requirements, and one that meets the requirements of nonparametric data. Find new data of interest practice with. https://stat.ethz.ch/r-manual/r-patched/ library/datasets/html/00index.html
+ Sources n Hartlaub, BA. 2011. Introduction to R. [internet]. Downloaded on January 26, 2015. Available at http://www2.kenyon.edu/depts/math/hartlaub/ Math305%20Fall2011/R.htm n Hosmer DW, Lemeshow S, and Sturdivant RX, editors. 1989. Applied Logistic Regression, 3rd edition. New York: John Wiley & Sons Inc. n Stack Exchange. [internet]. Fit a Line with LOESS in R. Downloaded on January 30, 2015. Available at http://stackoverflow.com/questions/15337777/fit-aline-with-loess-in-r