Dealing with Missing Data: Strategies for Beginners to Data Analysis

Size: px
Start display at page:

Download "Dealing with Missing Data: Strategies for Beginners to Data Analysis"

Transcription

1 Dealing with Missing Data: Strategies for Beginners to Data Analysis Rachel Margolis, PhD Assistant Professor, Department of Sociology Center for Population, Aging, and Health University of Western Ontario

2 What exactly do you mean by missing data? In a typical data set, information is missing for some variables for some cases. E.g. usually sizable amount of missing data for income In Stata, shown with. Or.m Sometimes value=99 or 999. Need to check the codebook!

3 Why Data could be Missing 1- Outright refusal to answer 2- In self-administered surveys, people often overlook or forget to answer some questions 3- Even trained interviewers occasionally may neglect to ask some questions 4- Respondents say they do not know the answer 5- Respondents may not have the information available to them at the time of the survey 6- Question is inapplicable. e.g. Quality of marriage to unmarried respondents

4 Why Data could be Missing 7- In longitudinal studies, people who were interviewed in one wave, may move or die before the next wave. e.g. in one wave, but not the next 8- Some records could be lost. 9- Data couldn't be read by person inputting into database.

5 My Perspective on Missing Data This is not my area of research. I am a user of secondary survey data and have dealt with missing data in my research Instructor for applied regression course (Soc 9007) where many students deal with data analysis for the first time. This talk is not geared for experienced users. Drawing on Allison (2002) Missing Data and other

6 Why is missing data a problem in social science and health research? 1) Nearly all standard statistical methods presume that every case has info on all variables to be included in the analysis. 2) Multivariate analysis of large surveys: even if small percentages of missing data on each variable, you may have a large amount of cases with missing data on any of these variables 3) Analysis of small data sets (clinical data, cross-national data, quantitative analysis of qualitative data, ever case is important. 4) Analysis of variables involving sensitive topics

7 Why is missing data a problem in social science and health research? 5) If missing cases are deleted: - Reduces sample size and lower statistical power (lower SE and harder to detect sig relationships) - Biased estimates (sample selection) because analytic sample is not representative of whole sample 6) If we impute missing data - Risk of biased estimates: inadequate imputations - Biased standard errors and sig tests: over fitting. 7) Publication of research: Journal editors and reviewers are increasingly strict about how you deal with missing data

8 Step 1: Assess WHY data are missing Go to the codebook for your data Was the question not asked of all respondents? (It could have been inapplicable, or only asked of a subset to save cost) How are missing values coded? (There may be subcategories) Talk with others who work with the same data

9 Step 2: Analyze missing data as the dependent variable The next step in dealing with missing data is to empirically understand the nature of the missing data pattern. - Create a dummy variable for whether observations are missing on that value or not: Cases with missing values are coded as 1, cases without missing values coded as 0 - Estimate logistic regression or similar model, and other variables used as predictors - This will give you hints as to which characteristics are more likely to be associated with missing. This is similar to analysis of attrition in longitudinal data

10 Why analyze missing data as dependent variable? Provide the researcher with a substantive understanding of the missing data pattern Can help with selecting the best technique to address the missing data problem Can help with using the technique: creating weights, creating imputation data Depending on space, can be a part of your story: Missing data analysis as a section of a masters thesis, dissertation, or book. Appendix or footnote in journal article

11 Step 3: Determine the nature of missing data a) Missing completely at random (MCAR) b) Missing at random (MAR) c) Missing not at random (MNAR)

12 Step 3: Determine the nature of missing data a) Missing completely at random (MCAR) Missing cases are unrelated to any variable in the analysis (including the variable with missing data itself) Example: 1% of records were lost and fell into the mud. One computer with the data broke down out of 10. not related to which data it was holding. Analysis remains unbiased. We lose power, but estimated parameters are not biased. Most missing data techniques will work well

13 Step 3: Determine the nature of missing data a) Missing completely at random (MCAR) b) Missing at random (MAR) If the data meet the requirement that missingness does not depend on the value of x, after controlling for another variable. Extent that missingness is correlated with other variables that are included in the analysis. Example: Depressed people might be less likely to report their income (reported income associated with depression). Depressed people might have lower income in general. When ignoring missing data, the distribution of income would be higher. If within depressed patients, the probability of reported income is unrelated to income level, then data are MAR not MCAR. MAR does produce bias but we have ways of dealing with it.

14 Step 3: Determine the nature of missing data a) Missing completely at random (MCAR) b) Missing at random (MAR) c) Missing not at random (MNAR) Example: If we are studying mental health and the depressed are less likely to report their mental health, then data are NMAR. The mean mental health level will be biased than if we had the complete data. We need to write a model that accounts for the missing data process. Bias could be large or small depending on your data.

15 Step 4: Choose a technique 1) Listwise deletion 2) Simple techniques to avoid: Pairwise deletion, Hot deck imputation, Mean substitution 3) Dummy variable for missing data 4) Regression substitution 5) Multiple imputation 6) Maximum likelihood estimation

16 1- Listwise deletion Method: Delete any case which has missing data on any of the variables of interest Advantages: - Simple: default option in many statistical programs. - Acceptable with a small amount of missing data, one rule is less than 5% of the full sample. Disadvantages - Can quickly reduce sample size and statistical power where many variables have missing data - Undetected selection bias - Biased when data are not MCAR

17 2- Simple techniques to avoid a) Pairwise deletion: Deletes pairs of specific missing data, but not the whole observation. b) Hot deck imputation: Substituting a randomly selected similar unit for the missing value. c) Mean substitution: Substituting the mean value for the missing data Advantages: All available data are used Disadvantages: May over or underestimate coefficients Overfits data: artificially increases model fit by assuming that similar units are identical. Lower standard errors. Hot deck: Hard to justify the method for selecting similar units. Mean imputation: Hard to justify that missing values would be the mean. Doesn t take into account how they are different.

18 3- Single imputation/regression substitution Method: Use linear regression to predict what the missing value should be on the basis of other variables that are present. Then substitute the predicted value for the missing value. Advantages: More logical than other methods Full sample size preserved Disadvantages Overfits data: artificially increases model fit by assuming that similar units are identical. Lower standard errors.

19 4- Extra dummy variable for missing data Method: Add an extra dummy variable (coded 1 for the missing values and 0 otherwise to a series of dummy variables). Example: Education: university degree (ref), high school graduate (dummy), less than high school (dummy), unknown education (dummy) Advantages: - Full sample size preserved - Association between DV and missing data dummy is estimated

20 4- Extra dummy variable for missing data Disadvantages - Heterogeneity: missing data dummy possibly combined very diff vases together - Requires many extra dummy variables if you have missing data on multiple variables - Requires use of categorical/dummy vars not continuous variables

21 5- Multiple Imputation Method: Replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. These multiply imputed data sets are then analyzed using standard procedures for complete data and combine the results from these analyses. Advantages: Logical, full sample preserved By including random error, imputed data are more noisy than the observed data, therefore don t overfit as much as other methods. Disadvantages: Not necessarily available for all kinds of models. Not appropriate for missing on key independent variable or DV

22 6- Maximum Likelihood Estimation Method: EM algorithms estimate coefficients for model and standard errors with missing data. Advantages: Don t impute missing data. Best fitting parameters are selected via iterations that maximize the probability of observing the data that were collected Disadvantages Requires more statistical knowledge. Might require the use of different statistical programs. More common for SEM programs.

23 Summary First, you must understand why you have missing data and examine the patterns. Then you can choose a technique to deal with missing data. You may choose more than one. No matter how you deal with missing data, you should run your analysis various ways: With and without missing values included Using different methods to test whether it changes results Think about the direction in which missing values biases the results Before you start using one of these techniques, invest in understanding what assumptions you are making and how to do it with the software that you use.

24 General approaches The information presented here focused on general approaches for basic statistical analysis: OLS regression, logistic regression, ANOVA See literature on disciplinary and model-specific techniques and norms (multi-level models, structural equation modeling, factor analysis, panel data, etc.) Statistical Software: SPSS: limited in basic version. Some expensive upgrades available. Stata: multiple imputation, mi, ice, micombine SAS: MI MIANALYZE R: many options

25 References Allison, P.( 2002). Missing Data. Sage. A little green book. Johnson & Young (2011) Toward best practices in analyzing datasets with missing data: Comparisons and recommendations. Journal of Marriage and Family 73: Acock. (2005) Working with missing values. Journal of Marriage and Family 67: Raghunathan. (2004) What to do with missing data? Some options for analysis of incomplete data. Annual Review of Public Health 25: Little and Rubin. (1989). The analysis of Social Science Data with Missing Values. Sociological Methods and Research.

Introduction to Survey Data Analysis. Linda K. Owens, PhD. Assistant Director for Sampling & Analysis

Introduction to Survey Data Analysis. Linda K. Owens, PhD. Assistant Director for Sampling & Analysis Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis General information Please hold questions until the end of the presentation Slides available at www.srl.uic.edu/seminars/fall15seminars.htm

More information

1. Understand & evaluate survey. What is survey data? When analyzing survey data... General information. Focus of the webinar

1. Understand & evaluate survey. What is survey data? When analyzing survey data... General information. Focus of the webinar What is survey data? Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Data gathered from a sample of individuals Sample is random (drawn using probabilistic

More information

Introduction to Survey Data Analysis. Focus of the Seminar. When analyzing survey data... Young Ik Cho, PhD. Survey Research Laboratory

Introduction to Survey Data Analysis. Focus of the Seminar. When analyzing survey data... Young Ik Cho, PhD. Survey Research Laboratory Introduction to Survey Data Analysis Young Ik Cho, PhD Research Assistant Professor University of Illinois at Chicago Fall 2008 Focus of the Seminar Data Cleaning/Missing Data Sampling Bias Reduction When

More information

Introduction to Survey Data Analysis

Introduction to Survey Data Analysis Introduction to Survey Data Analysis Young Cho at Chicago 1 The Circle of Research Process Theory Evaluation Real World Theory Hypotheses Test Hypotheses Data Collection Sample Operationalization/ Measurement

More information

PLAYING WITH HISTORY CAN AFFECT YOUR FUTURE: HOW HANDLING MISSING DATA CAN IMPACT PARAMATER ESTIMATION AND RISK MEASURE BY JONATHAN LEONARDELLI

PLAYING WITH HISTORY CAN AFFECT YOUR FUTURE: HOW HANDLING MISSING DATA CAN IMPACT PARAMATER ESTIMATION AND RISK MEASURE BY JONATHAN LEONARDELLI PLAYING WITH HISTORY CAN AFFECT YOUR FUTURE: HOW HANDLING MISSING DATA CAN IMPACT PARAMATER ESTIMATION AND RISK MEASURE BY JONATHAN LEONARDELLI March 1, 2012 ABSTRACT Missing data is a common problem facing

More information

MULTIPLE IMPUTATION. Adrienne D. Woods Methods Hour Brown Bag April 14, 2017

MULTIPLE IMPUTATION. Adrienne D. Woods Methods Hour Brown Bag April 14, 2017 MULTIPLE IMPUTATION Adrienne D. Woods Methods Hour Brown Bag April 14, 2017 A COLLECTIVIST APPROACH TO BEST PRACTICES As I began learning about MI last semester, I realized that there are a lot of guidelines

More information

Sensitivity Analysis of Nonlinear Mixed-Effects Models for. Longitudinal Data That Are Incomplete

Sensitivity Analysis of Nonlinear Mixed-Effects Models for. Longitudinal Data That Are Incomplete ABSTRACT Sensitivity Analysis of Nonlinear Mixed-Effects Models for Longitudinal Data That Are Incomplete Shelley A. Blozis, University of California, Davis, CA Appropriate applications of methods for

More information

UCLA Department of Statistics Papers

UCLA Department of Statistics Papers UCLA Department of Statistics Papers Title R&D, Attrition and Multiple Imputation in The Business Research and Development and Innovation Survey (BRDIS) Permalink https://escholarship.org/uc/item/1bx747j2

More information

Statistical Considerations

Statistical Considerations Version 1.3 Effective date: 21 May 2012 Author: Approved by: Dr Ranjit Lall, Research Fellow Statistician Dr Sarah Duggan, CTU Manager Revision Chronology: Effective Date Version 1.3 21 May 2012 Version

More information

MISSING DATA TREATMENTS AT THE SECOND LEVEL OF HIERARCHICAL LINEAR MODELS. Suzanne W. St. Clair, B.S., M.P.H. Dissertation Prepared for the Degree of

MISSING DATA TREATMENTS AT THE SECOND LEVEL OF HIERARCHICAL LINEAR MODELS. Suzanne W. St. Clair, B.S., M.P.H. Dissertation Prepared for the Degree of MISSING DATA TREATMENTS AT THE SECOND LEVEL OF HIERARCHICAL LINEAR MODELS Suzanne W. St. Clair, B.S., M.P.H. Dissertation Prepared for the Degree of DOCTOR OF PHILOSOPHY UNIVERSITY OF NORTH TEXAS August

More information

Kristin Gustavson * and Ingrid Borren

Kristin Gustavson * and Ingrid Borren Gustavson and Borren BMC Medical Research Methodology 2014, 14:133 RESEARCH ARTICLE Open Access Bias in the study of prediction of change: a Monte Carlo simulation study of the effects of selective attrition

More information

OVERVIEW OF APPROACHES FOR MISSING DATA. Susan Buchman Spring 2018

OVERVIEW OF APPROACHES FOR MISSING DATA. Susan Buchman Spring 2018 OVERVIEW OF APPROACHES FOR MISSING DATA Susan Buchman 36-726 Spring 2018 WHICH OF THESE PRODUCE MISSING DATA? A patient in a trial for a new drug dies before the study is over A patient in a trial for

More information

Using Weights in the Analysis of Survey Data

Using Weights in the Analysis of Survey Data Using Weights in the Analysis of Survey Data David R. Johnson Department of Sociology Population Research Institute The Pennsylvania State University November 2008 What is a Survey Weight? A value assigned

More information

Department of Sociology King s University College Sociology 302b: Section 570/571 Research Methodology in Empirical Sociology Winter 2006

Department of Sociology King s University College Sociology 302b: Section 570/571 Research Methodology in Empirical Sociology Winter 2006 Department of Sociology King s University College Sociology 302b: Section 570/571 Research Methodology in Empirical Sociology Winter 2006 Computer assignment #3 DUE Wednesday MARCH 29 th (in class) Regression

More information

Chapter 3. Basic Statistical Concepts: II. Data Preparation and Screening. Overview. Data preparation. Data screening. Score reliability and validity

Chapter 3. Basic Statistical Concepts: II. Data Preparation and Screening. Overview. Data preparation. Data screening. Score reliability and validity Chapter 3 Basic Statistical Concepts: II. Data Preparation and Screening To repeat what others have said, requires education; to challenge it, requires brains. Overview Mary Pettibone Poole Data preparation

More information

Missing data procedures for psychosocial research

Missing data procedures for psychosocial research Missing data procedures for psychosocial research Elizabeth Stuart Mental Health Summer Institute 330.616 Johns Hopkins Bloomberg School of Public Health Department of Mental Health Department of Biostatistics

More information

Modeling Contextual Data in. Sharon L. Christ Departments of HDFS and Statistics Purdue University

Modeling Contextual Data in. Sharon L. Christ Departments of HDFS and Statistics Purdue University Modeling Contextual Data in the Add Health Sharon L. Christ Departments of HDFS and Statistics Purdue University Talk Outline 1. Review of Add Health Sample Design 2. Modeling Add Health Data a. Multilevel

More information

Analyzing non-normal data with categorical response variables

Analyzing non-normal data with categorical response variables SESUG 2016 Paper SD-184 Analyzing non-normal data with categorical response variables Niloofar Ramezani, University of Northern Colorado; Ali Ramezani, Allameh Tabataba'i University Abstract In many applications,

More information

Estimation of multiple and interrelated dependence relationships

Estimation of multiple and interrelated dependence relationships STRUCTURE EQUATION MODELING BASIC ASSUMPTIONS AND CONCEPTS: A NOVICES GUIDE Sunil Kumar 1 and Dr. Gitanjali Upadhaya 2 Research Scholar, Department of HRM & OB, School of Business Management & Studies,

More information

Methods for Multilevel Modeling and Design Effects. Sharon L. Christ Departments of HDFS and Statistics Purdue University

Methods for Multilevel Modeling and Design Effects. Sharon L. Christ Departments of HDFS and Statistics Purdue University Methods for Multilevel Modeling and Design Effects Sharon L. Christ Departments of HDFS and Statistics Purdue University Talk Outline 1. Review of Add Health Sample Design 2. Modeling Add Health Data a.

More information

PROPENSITY SCORE MATCHING A PRACTICAL TUTORIAL

PROPENSITY SCORE MATCHING A PRACTICAL TUTORIAL PROPENSITY SCORE MATCHING A PRACTICAL TUTORIAL Cody Chiuzan, PhD Biostatistics, Epidemiology and Research Design (BERD) Lecture March 19, 2018 1 Outline Experimental vs Non-Experimental Study WHEN and

More information

Multiple Imputation and Multiple Regression with SAS and IBM SPSS

Multiple Imputation and Multiple Regression with SAS and IBM SPSS Multiple Imputation and Multiple Regression with SAS and IBM SPSS See IntroQ Questionnaire for a description of the survey used to generate the data used here. *** Mult-Imput_M-Reg.sas ***; options pageno=min

More information

Secondary analysis of national survey datasetsjjns_213

Secondary analysis of national survey datasetsjjns_213 bs_bs_banner 130..135 Japan Journal of Nursing Science (2013) 10, 130 135 doi:10.1111/j.1742-7924.2012.00213.x METHODOLOGICAL ARTICLE Secondary analysis of national survey datasetsjjns_213 Sunjoo BOO 1

More information

Potential sources of missing data in a meta-analysis. Missing data. Concepts in missing data. Concepts in missing data.

Potential sources of missing data in a meta-analysis. Missing data. Concepts in missing data. Concepts in missing data. SMG advanced workshop, ardiff, March 4-5, 2010 Missing data Julian Higgins MR Biostatistics Unit ambridge, UK with thanks to Ian White, Fred Wolf, Angela Wood, Alex Sutton Potential sources of missing

More information

Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies

Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies Belen Alejos Ferreras Centro Nacional de Epidemiología Instituto de Salud Carlos III 19 de Octubre

More information

Application of Multiple Imputation in Dealing with Missing Data in Agricultural Surveys: The Case of BMP Adoption

Application of Multiple Imputation in Dealing with Missing Data in Agricultural Surveys: The Case of BMP Adoption Journal of Agricultural and Resource Economics 43(1):78 102 ISSN 1068-5502 Copyright 2018 Western Agricultural Economics Association Application of Multiple Imputation in Dealing with Missing Data in Agricultural

More information

Multilevel Modeling Tenko Raykov, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania

Multilevel Modeling Tenko Raykov, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania Multilevel Modeling Tenko Raykov, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania Multilevel Modeling Part 1 Introduction, Basic and Intermediate Modeling Issues Tenko Raykov Michigan

More information

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN Predictive Modeling using SAS Enterprise Miner and SAS/STAT : Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN 1 Overview This presentation will: Provide a brief introduction of how to set

More information

A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values

A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values Australian Journal of Basic and Applied Sciences, 6(7): 312-317, 2012 ISSN 1991-8178 A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values 1 K. Suresh

More information

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals heavily

More information

Chapter URL:

Chapter URL: This PDF is a selection from an out-of-print volume from the National Bureau of Economic Research Volume Title: The Measurement of Labor Cost Volume Author/Editor: Jack E. Triplett, ed. Volume Publisher:

More information

Cost-effectiveness and cost-utility analysis accompanying Cancer Clinical trials. NCIC CTG New Investigators Workshop

Cost-effectiveness and cost-utility analysis accompanying Cancer Clinical trials. NCIC CTG New Investigators Workshop Cost-effectiveness and cost-utility analysis accompanying Cancer Clinical trials NCIC CTG New Investigators Workshop Keyue Ding, PhD. NCIC Clinical Trials Group Dept. of Public Health Sciences Queen s

More information

Missing data in software engineering

Missing data in software engineering Chapter 1 Missing data in software engineering The goal of this chapter is to increase the awareness of missing data techniques among people performing studies in software engineering. Three primary reasons

More information

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis In any longitudinal analysis, we can distinguish between analyzing trends vs individual change that is, model

More information

Web Appendix to Advertising Spillovers: Evidence from Online. Field-Experiments and Implications for Returns on Advertising

Web Appendix to Advertising Spillovers: Evidence from Online. Field-Experiments and Implications for Returns on Advertising Web Appendix to Advertising Spillovers: Evidence from Online Field-Experiments and Implications for Returns on Advertising x A Estimation using a linear probability model In this section, I go one-by-one

More information

Ch. 15 Data Preparation and Description

Ch. 15 Data Preparation and Description TECH 646 Analysis of Research in Industry and Technology PART IV Analysis and Presentation of Data: Data Presentation and Description; Exploring, Displaying, and Examining Data; Hypothesis Testing; Measures

More information

Financing Constraints and Firm Inventory Investment: A Reexamination

Financing Constraints and Firm Inventory Investment: A Reexamination Financing Constraints and Firm Inventory Investment: A Reexamination John D. Tsoukalas* Structural Economic Analysis Division Monetary Analysis Bank of England December 2004 Abstract This paper shows that

More information

The Application of STATA s Multiple Imputation Techniques to Analyze a Design of Experiments with Multiple Responses

The Application of STATA s Multiple Imputation Techniques to Analyze a Design of Experiments with Multiple Responses The Application of STATA s Multiple Imputation Techniques to Analyze a Design of Experiments with Multiple Responses STATA Conference - San Diego 2012 Clara Novoa, Ph.D., Bahram Aiabanpour, Ph.D., Suleima

More information

Sawtooth Software. Sample Size Issues for Conjoint Analysis Studies RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Sawtooth Software. Sample Size Issues for Conjoint Analysis Studies RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES Sample Size Issues for Conjoint Analysis Studies Bryan Orme, Sawtooth Software, Inc. 1998 Copyright 1998-2001, Sawtooth Software, Inc. 530 W. Fir St. Sequim, WA

More information

Sawtooth Software. Learning Effects in Preference Tasks: Choice-Based Versus Standard Conjoint RESEARCH PAPER SERIES

Sawtooth Software. Learning Effects in Preference Tasks: Choice-Based Versus Standard Conjoint RESEARCH PAPER SERIES Sawtooth Software RESEARCH PAPER SERIES Learning Effects in Preference Tasks: Choice-Based Versus Standard Conjoint Joel Huber, Duke University Dick R. Wittink, Cornell University Richard M. Johnson, Sawtooth

More information

Two Way ANOVA. Turkheimer PSYC 771. Page 1 Two-Way ANOVA

Two Way ANOVA. Turkheimer PSYC 771. Page 1 Two-Way ANOVA Page 1 Two Way ANOVA Two way ANOVA is conceptually like multiple regression, in that we are trying to simulateously assess the effects of more than one X variable on Y. But just as in One Way ANOVA, the

More information

What is DSC 410/510? DSC 410/510 Multivariate Statistical Methods. What is Multivariate Analysis? Computing. Some Quotes.

What is DSC 410/510? DSC 410/510 Multivariate Statistical Methods. What is Multivariate Analysis? Computing. Some Quotes. What is DSC 410/510? DSC 410/510 Multivariate Statistical Methods Introduction Applications-oriented oriented introduction to multivariate statistical methods for MBAs and upper-level business undergraduates

More information

Partial Least Squares Structural Equation Modeling PLS-SEM

Partial Least Squares Structural Equation Modeling PLS-SEM Partial Least Squares Structural Equation Modeling PLS-SEM New Edition Joe Hair Cleverdon Chair of Business Director, DBA Program Statistical Analysis Historical Perspectives Early 1900 s 1970 s = Basic

More information

Linear model to forecast sales from past data of Rossmann drug Store

Linear model to forecast sales from past data of Rossmann drug Store Abstract Linear model to forecast sales from past data of Rossmann drug Store Group id: G3 Recent years, the explosive growth in data results in the need to develop new tools to process data into knowledge

More information

STATISTICS PART Instructor: Dr. Samir Safi Name:

STATISTICS PART Instructor: Dr. Samir Safi Name: STATISTICS PART Instructor: Dr. Samir Safi Name: ID Number: Question #1: (20 Points) For each of the situations described below, state the sample(s) type the statistical technique that you believe is the

More information

Getting Started with HLM 5. For Windows

Getting Started with HLM 5. For Windows For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 About this Document... 3 1.2 Introduction to HLM... 3 1.3 Accessing HLM... 3 1.4 Getting Help with HLM... 3 Section 2: Accessing

More information

Optimal Method For Analysis Of Disconnected Diallel Tests. Bin Xiang and Bailian Li

Optimal Method For Analysis Of Disconnected Diallel Tests. Bin Xiang and Bailian Li Optimal Method For Analysis Of Disconnected Diallel Tests Bin Xiang and Bailian Li Department of Forestry, North Carolina State University, Raleigh, NC 27695-82 bxiang@unity.ncsu.edu ABSTRACT The unique

More information

Semester 2, 2015/2016

Semester 2, 2015/2016 ECN 3202 APPLIED ECONOMETRICS 3. MULTIPLE REGRESSION B Mr. Sydney Armstrong Lecturer 1 The University of Guyana 1 Semester 2, 2015/2016 MODEL SPECIFICATION What happens if we omit a relevant variable?

More information

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy AGENDA 1. Introduction 2. Use Cases 3. Popular Algorithms 4. Typical Approach 5. Case Study 2016 SAPIENT GLOBAL MARKETS

More information

The correct bibliographic citation for this manual is as follows: Shreve, Joni N. and Donna Dea Holland SAS Certification Prep Guide:

The correct bibliographic citation for this manual is as follows: Shreve, Joni N. and Donna Dea Holland SAS Certification Prep Guide: The correct bibliographic citation for this manual is as follows: Shreve, Joni N. and Donna Dea Holland. 2018. SAS Certification Prep Guide: Statistical Business Analysis Using SAS 9. Cary, NC: SAS Institute

More information

Han Du. Department of Psychology University of California, Los Angeles Los Angeles, CA

Han Du. Department of Psychology University of California, Los Angeles Los Angeles, CA Han Du Department of Psychology University of California, Los Angeles Los Angeles, CA 90095-1563 Email: hdu@psych.ucla.edu EDUCATION Ph.D. in Quantitative Psychology 2018 University of Notre Dame M.S.

More information

Research Methods in Human-Computer Interaction

Research Methods in Human-Computer Interaction Research Methods in Human-Computer Interaction Chapter 5- Surveys Introduction Surveys are a very commonly used research method Surveys are also often-maligned because they are not done in the proper manner

More information

Archives of Scientific Psychology Reporting Questionnaire for Manuscripts Describing Primary Data Collections

Archives of Scientific Psychology Reporting Questionnaire for Manuscripts Describing Primary Data Collections (Based on APA Journal Article Reporting Standards JARS Questionnaire) 1 Archives of Scientific Psychology Reporting Questionnaire for Manuscripts Describing Primary Data Collections JARS: ALL: These questions

More information

Summarizing categorical data involves boiling down all the information into just a few

Summarizing categorical data involves boiling down all the information into just a few Chapter 1 Summarizing Categorical Data: Counts and Percents In This Chapter Making tables to summarize categorical data Highlighting the difference between frequencies and relative frequencies Interpreting

More information

Can Microtargeting Improve Survey Sampling?

Can Microtargeting Improve Survey Sampling? Can Microtargeting Improve Survey Sampling? An Assessment of Accuracy and Bias in Consumer File Marketing Data Josh Pasek University of Michigan jpasek@umich.edu NSF / Stanford Conference: Future of Survey

More information

Tanja Srebotnjak, United Nations Statistics Division, New York, USA 1 Abstract

Tanja Srebotnjak, United Nations Statistics Division, New York, USA 1  Abstract Multiple Imputations of Missing Data in the Environmental Sustainability Index - Pain or Gain? Tanja Srebotnjak, United Nations Statistics Division, New York, USA http://www.un.org/depts/unsd/ in cooperation

More information

Multilevel/ Mixed Effects Models: A Brief Overview

Multilevel/ Mixed Effects Models: A Brief Overview Multilevel/ Mixed Effects Models: A Brief Overview Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 27, 2018 These notes borrow very heavily, often/usually

More information

Workshop II Project Management

Workshop II Project Management Workshop II Project Management UNITAR-HIROSHIMA FELLOWSHIP FOR AFGHANISTAN 2007 Introduction to Project Management 15 17 August 2007, Dehradun, India Presented by: Jobaid Kabir, Ph.D. Fellowship Program

More information

Implementing Current Regulatory Guidance: An Industry Perspective

Implementing Current Regulatory Guidance: An Industry Perspective Implementing Current Regulatory Guidance: An Industry Perspective European Statistical Meeting: Advances in the Treatment of Missing Data November 18, 2011 Brussels Mouna Akacha, Novartis Pharma AG Basel

More information

Analysis of Factors that Affect Productivity of Enterprise Software Projects

Analysis of Factors that Affect Productivity of Enterprise Software Projects Analysis of Factors that Affect Productivity of Enterprise Software Projects Tsuneo Furuyama Scool of Science, Tokai University Kitakaname 4-1-1, Hiratsuka city, Japan furuyama@tokai-u.jp Abstract Analysis

More information

Estoril Education Day

Estoril Education Day Estoril Education Day -Experimental design in Proteomics October 23rd, 2010 Peter James Note Taking All the Powerpoint slides from the Talks are available for download from: http://www.immun.lth.se/education/

More information

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics

More information

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models. Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable

More information

Machine-learning models for predicting drug approvals and clinical-phase transitions

Machine-learning models for predicting drug approvals and clinical-phase transitions Machine-learning models for predicting drug approvals and clinical-phase transitions Andrew W. Lo 1,2,3 *, Kien Wei Siah 1,2, Chi Heem Wong 1,2 1 Laboratory for Financial Engineering, Sloan School of Management,

More information

WORK INTENSIFICATION, DISCRETION, AND THE DECLINE IN WELL-BEING AT WORK.

WORK INTENSIFICATION, DISCRETION, AND THE DECLINE IN WELL-BEING AT WORK. WORK INTENSIFICATION, DISCRETION, AND THE DECLINE IN WELL-BEING AT WORK. INTRODUCTION Francis Green University of Kent Previous studies have established that work intensification was an important feature

More information

Data Integration (stat08014)

Data Integration (stat08014) Data Integration (stat08014) Luciana Dalla Valle, University of Plymouth, UK Abstract This article introduces some of the most popular techniques of data integration, that allow the combination of information

More information

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology As noted previously, Hierarchical Linear Modeling (HLM) can be considered a particular instance

More information

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Final Project Report Alexander Herrmann Advised by Dr. Andrew Gentles December

More information

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo Ensemble Modeling Toronto Data Mining Forum November 2017 Helen Ngo Agenda Introductions Why Ensemble Models? Simple & Complex ensembles Thoughts: Post-real-life Experimentation Downsides of Ensembles

More information

Remarkable Team Building (Part 1 of 4) e = mc 2 Your practice is simply the reflection of your energy your energy as a doctor your energy as a leader your Teams energy your Tribes energy. Your practice

More information

SCHOOL OF AGRICULTURE

SCHOOL OF AGRICULTURE SCHOOL OF AGRICULTURE DEPARTMENT OF AGRICULTURAL ECONOMICS & AGRIBUSINESS 1. PhD in Applied Agricultural Economics and Policy 2. PhD in Agricultural Administration 3. PhD in Agribusiness PHD IN APPLIED

More information

PSC 508. Jim Battista. Dummies. Univ. at Buffalo, SUNY. Jim Battista PSC 508

PSC 508. Jim Battista. Dummies. Univ. at Buffalo, SUNY. Jim Battista PSC 508 PSC 508 Jim Battista Univ. at Buffalo, SUNY Dummies Dummy variables Sometimes we want to include categorical variables in our models Numerical variables that don t necessarily have any inherent order and

More information

Masters in Business Statistics (MBS) /2015. Department of Mathematics Faculty of Engineering University of Moratuwa Moratuwa. Web:

Masters in Business Statistics (MBS) /2015. Department of Mathematics Faculty of Engineering University of Moratuwa Moratuwa. Web: Masters in Business Statistics (MBS) - 2014/2015 Department of Mathematics Faculty of Engineering University of Moratuwa Moratuwa Web: www.mrt.ac.lk Course Coordinator: Prof. T S G Peiris Prof. in Applied

More information

Examination of Cross Validation techniques and the biases they reduce.

Examination of Cross Validation techniques and the biases they reduce. Examination of Cross Validation techniques and the biases they reduce. Dr. Jon Starkweather, Research and Statistical Support consultant. The current article continues from last month s brief examples

More information

ADVANCED DATA ANALYTICS

ADVANCED DATA ANALYTICS ADVANCED DATA ANALYTICS MBB essay by Marcel Suszka 17 AUGUSTUS 2018 PROJECTSONE De Corridor 12L 3621 ZB Breukelen MBB Essay Advanced Data Analytics Outline This essay is about a statistical research for

More information

Advice to Health Services Researchers: Be Cautious Using the Where Statement in SAS Programs for Nationally Representative Complex Survey Data

Advice to Health Services Researchers: Be Cautious Using the Where Statement in SAS Programs for Nationally Representative Complex Survey Data Advice to Health Services Researchers: Be Cautious Using the Where Statement in SAS Programs for Nationally Representative Complex Survey Data Hemalkumar B. Mehta, Michael L. Johnson Department of Clinical

More information

Using R for Introductory Statistics

Using R for Introductory Statistics R http://www.r-project.org Using R for Introductory Statistics John Verzani CUNY/the College of Staten Island January 6, 2009 http://www.math.csi.cuny.edu/verzani/r/ams-maa-jan-09.pdf John Verzani (CSI)

More information

Week 11: Collinearity

Week 11: Collinearity Week 11: Collinearity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Regression and holding other

More information

Ch. 15 Data Preparation and Description

Ch. 15 Data Preparation and Description TECH 646 Analysis of Research in Industry and Technology PART IV Analysis and Presentation of Data: Data Presentation and Description; Exploring, Displaying, and Examining Data; Hypothesis Testing; Measures

More information

Research Article One-Step Dynamic Classifier Ensemble Model for Customer Value Segmentation with Missing Values

Research Article One-Step Dynamic Classifier Ensemble Model for Customer Value Segmentation with Missing Values Mathematical Problems in Engineering, Article ID 869628, 15 pages http://dx.doi.org/10.1155/2014/869628 Research Article One-Step Dynamic Classifier Ensemble Model for Customer Value Segmentation with

More information

Estimating Discrete Choice Models of Demand. Data

Estimating Discrete Choice Models of Demand. Data Estimating Discrete Choice Models of Demand Aggregate (market) aggregate (market-level) quantity prices + characteristics (+advertising) distribution of demographics (optional) sample from distribution

More information

What we can do about human error

What we can do about human error What we can do about human error Petroleum Safety Conference May 2016 12 May 2016 1 Decision making All relevant information Best methods + for interpreting = information Perfect decisions 2 Situational

More information

Code Compulsory Module Credits Continuous Assignment

Code Compulsory Module Credits Continuous Assignment CURRICULUM AND SCHEME OF EVALUATION Compulsory Modules Evaluation (%) Code Compulsory Module Credits Continuous Assignment Final Exam MA 5210 Probability and Statistics 3 40±10 60 10 MA 5202 Statistical

More information

Gasoline Consumption Analysis

Gasoline Consumption Analysis Gasoline Consumption Analysis One of the most basic topics in economics is the supply/demand curve. Simply put, the supply offered for sale of a commodity is directly related to its price, while the demand

More information

01 University of Plymouth Research Outputs University of Plymouth Research Outputs

01 University of Plymouth Research Outputs University of Plymouth Research Outputs University of Plymouth PEARL https://pearl.plymouth.ac.uk 01 University of Plymouth Research Outputs University of Plymouth Research Outputs 2017-12-20 Data Integration Dalla Valle, L http://hdl.handle.net/10026.1/9294

More information

Not Just Another Pretty Formula: Practical Methods for Mitigating Self-Selection Bias in Billing Analysis Regressions

Not Just Another Pretty Formula: Practical Methods for Mitigating Self-Selection Bias in Billing Analysis Regressions Not Just Another Pretty Formula: Practical Methods for Mitigating Self-Selection Bias in Billing Analysis Regressions ABSTRACT Dr. Miriam L. Goldberg and G. Kennedy Agnew, DNV GL, Madison, WI Dr. Meredith

More information

If you are using a survey: who will participate in your survey? Why did you decide on that? Explain

If you are using a survey: who will participate in your survey? Why did you decide on that? Explain Journal 11/13/18 If you are using a survey: who will participate in your survey? Why did you decide on that? Explain If you are not using a survey: Where will you look for information? Why did you decide

More information

Modern Genetic Evaluation Procedures Why BLUP?

Modern Genetic Evaluation Procedures Why BLUP? Modern Genetic Evaluation Procedures Why BLUP? Hans-Ulrich Graser 1 Introduction The developments of modem genetic evaluation procedures have been mainly driven by scientists working with the dairy populations

More information

What Is Conjoint Analysis? DSC 410/510 Multivariate Statistical Methods. How Is Conjoint Analysis Done? Empirical Example

What Is Conjoint Analysis? DSC 410/510 Multivariate Statistical Methods. How Is Conjoint Analysis Done? Empirical Example What Is Conjoint Analysis? DSC 410/510 Multivariate Statistical Methods Conjoint Analysis 1 A technique for understanding how respondents develop preferences for products or services Also known as trade-off

More information

Disaggregating the Return on Investment to IT Capital

Disaggregating the Return on Investment to IT Capital Association for Information Systems AIS Electronic Library (AISeL) ICIS 1998 Proceedings International Conference on Information Systems (ICIS) December 1998 Disaggregating the Return on Investment to

More information

Midterm Test Department: Computer Science Instructor: Steve Easterbrook Date and Time: 10:10am, Thursday 1st March, 2012

Midterm Test Department: Computer Science Instructor: Steve Easterbrook Date and Time: 10:10am, Thursday 1st March, 2012 CSC302 Engineering Large Software Systems page /9 Faculty of Arts and Science University of Toronto Midterm Test Department: Computer Science Instructor: Steve Easterbrook Date and Time: 0:0am, Thursday

More information

Standard for applying the Principle. Involving Stakeholders DRAFT.

Standard for applying the Principle. Involving Stakeholders DRAFT. V V Standard for applying the Principle Involving Stakeholders DRAFT www.socialvalueint.org Table of Contents Introduction...1 Identifying stakeholders...4 Stakeholder involvement...5 Deciding how many

More information

SUGI 29 Statistics and Data Analysis. Paper

SUGI 29 Statistics and Data Analysis. Paper Paper 206-29 Using SAS Procedures to Make Sense of a Complex Food Store Survey Jeff Gossett, University of Arkansas for Medical Sciences, Little Rock, AR Pippa Simpson, University of Arkansas for Medical

More information

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING Abbas Heiat, College of Business, Montana State University, Billings, MT 59102, aheiat@msubillings.edu ABSTRACT The purpose of this study is to investigate

More information

Handbook On Impact Evaluation With Stata. Examples >>>CLICK HERE<<<

Handbook On Impact Evaluation With Stata. Examples >>>CLICK HERE<<< Handbook On Impact Evaluation With Stata Examples This page highlights books for designing an impact evaluation, animations, power Handbook on Impact Evaluation, Khandker, S. R., Koolwal, G. B., & Samad,

More information

GLMs the Good, the Bad, and the Ugly Ratemaking and Product Management Seminar March Christopher Cooksey, FCAS, MAAA EagleEye Analytics

GLMs the Good, the Bad, and the Ugly Ratemaking and Product Management Seminar March Christopher Cooksey, FCAS, MAAA EagleEye Analytics Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to

More information

Chapter 3. Database and Research Methodology

Chapter 3. Database and Research Methodology Chapter 3 Database and Research Methodology In research, the research plan needs to be cautiously designed to yield results that are as objective as realistic. It is the main part of a grant application

More information

PharmaSUG 2016 Paper 36

PharmaSUG 2016 Paper 36 PharmaSUG 2016 Paper 36 What's the Case? Applying Different Methods of Conducting Retrospective Case/Control Experiments in Pharmacy Analytics Aran Canes, Cigna, Bloomfield, CT ABSTRACT Retrospective Case/Control

More information

Practical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System

Practical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System r""'=~~"''''''''''''''''''''''''''''\;'=="'~''''o''''"'"''~ ~c_,,..! Practical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System Rainer Muche 1, Josef HogeP and Olaf

More information