Dealing with Missing Data: Strategies for Beginners to Data Analysis
|
|
- Patricia Hall
- 6 years ago
- Views:
Transcription
1 Dealing with Missing Data: Strategies for Beginners to Data Analysis Rachel Margolis, PhD Assistant Professor, Department of Sociology Center for Population, Aging, and Health University of Western Ontario
2 What exactly do you mean by missing data? In a typical data set, information is missing for some variables for some cases. E.g. usually sizable amount of missing data for income In Stata, shown with. Or.m Sometimes value=99 or 999. Need to check the codebook!
3 Why Data could be Missing 1- Outright refusal to answer 2- In self-administered surveys, people often overlook or forget to answer some questions 3- Even trained interviewers occasionally may neglect to ask some questions 4- Respondents say they do not know the answer 5- Respondents may not have the information available to them at the time of the survey 6- Question is inapplicable. e.g. Quality of marriage to unmarried respondents
4 Why Data could be Missing 7- In longitudinal studies, people who were interviewed in one wave, may move or die before the next wave. e.g. in one wave, but not the next 8- Some records could be lost. 9- Data couldn't be read by person inputting into database.
5 My Perspective on Missing Data This is not my area of research. I am a user of secondary survey data and have dealt with missing data in my research Instructor for applied regression course (Soc 9007) where many students deal with data analysis for the first time. This talk is not geared for experienced users. Drawing on Allison (2002) Missing Data and other
6 Why is missing data a problem in social science and health research? 1) Nearly all standard statistical methods presume that every case has info on all variables to be included in the analysis. 2) Multivariate analysis of large surveys: even if small percentages of missing data on each variable, you may have a large amount of cases with missing data on any of these variables 3) Analysis of small data sets (clinical data, cross-national data, quantitative analysis of qualitative data, ever case is important. 4) Analysis of variables involving sensitive topics
7 Why is missing data a problem in social science and health research? 5) If missing cases are deleted: - Reduces sample size and lower statistical power (lower SE and harder to detect sig relationships) - Biased estimates (sample selection) because analytic sample is not representative of whole sample 6) If we impute missing data - Risk of biased estimates: inadequate imputations - Biased standard errors and sig tests: over fitting. 7) Publication of research: Journal editors and reviewers are increasingly strict about how you deal with missing data
8 Step 1: Assess WHY data are missing Go to the codebook for your data Was the question not asked of all respondents? (It could have been inapplicable, or only asked of a subset to save cost) How are missing values coded? (There may be subcategories) Talk with others who work with the same data
9 Step 2: Analyze missing data as the dependent variable The next step in dealing with missing data is to empirically understand the nature of the missing data pattern. - Create a dummy variable for whether observations are missing on that value or not: Cases with missing values are coded as 1, cases without missing values coded as 0 - Estimate logistic regression or similar model, and other variables used as predictors - This will give you hints as to which characteristics are more likely to be associated with missing. This is similar to analysis of attrition in longitudinal data
10 Why analyze missing data as dependent variable? Provide the researcher with a substantive understanding of the missing data pattern Can help with selecting the best technique to address the missing data problem Can help with using the technique: creating weights, creating imputation data Depending on space, can be a part of your story: Missing data analysis as a section of a masters thesis, dissertation, or book. Appendix or footnote in journal article
11 Step 3: Determine the nature of missing data a) Missing completely at random (MCAR) b) Missing at random (MAR) c) Missing not at random (MNAR)
12 Step 3: Determine the nature of missing data a) Missing completely at random (MCAR) Missing cases are unrelated to any variable in the analysis (including the variable with missing data itself) Example: 1% of records were lost and fell into the mud. One computer with the data broke down out of 10. not related to which data it was holding. Analysis remains unbiased. We lose power, but estimated parameters are not biased. Most missing data techniques will work well
13 Step 3: Determine the nature of missing data a) Missing completely at random (MCAR) b) Missing at random (MAR) If the data meet the requirement that missingness does not depend on the value of x, after controlling for another variable. Extent that missingness is correlated with other variables that are included in the analysis. Example: Depressed people might be less likely to report their income (reported income associated with depression). Depressed people might have lower income in general. When ignoring missing data, the distribution of income would be higher. If within depressed patients, the probability of reported income is unrelated to income level, then data are MAR not MCAR. MAR does produce bias but we have ways of dealing with it.
14 Step 3: Determine the nature of missing data a) Missing completely at random (MCAR) b) Missing at random (MAR) c) Missing not at random (MNAR) Example: If we are studying mental health and the depressed are less likely to report their mental health, then data are NMAR. The mean mental health level will be biased than if we had the complete data. We need to write a model that accounts for the missing data process. Bias could be large or small depending on your data.
15 Step 4: Choose a technique 1) Listwise deletion 2) Simple techniques to avoid: Pairwise deletion, Hot deck imputation, Mean substitution 3) Dummy variable for missing data 4) Regression substitution 5) Multiple imputation 6) Maximum likelihood estimation
16 1- Listwise deletion Method: Delete any case which has missing data on any of the variables of interest Advantages: - Simple: default option in many statistical programs. - Acceptable with a small amount of missing data, one rule is less than 5% of the full sample. Disadvantages - Can quickly reduce sample size and statistical power where many variables have missing data - Undetected selection bias - Biased when data are not MCAR
17 2- Simple techniques to avoid a) Pairwise deletion: Deletes pairs of specific missing data, but not the whole observation. b) Hot deck imputation: Substituting a randomly selected similar unit for the missing value. c) Mean substitution: Substituting the mean value for the missing data Advantages: All available data are used Disadvantages: May over or underestimate coefficients Overfits data: artificially increases model fit by assuming that similar units are identical. Lower standard errors. Hot deck: Hard to justify the method for selecting similar units. Mean imputation: Hard to justify that missing values would be the mean. Doesn t take into account how they are different.
18 3- Single imputation/regression substitution Method: Use linear regression to predict what the missing value should be on the basis of other variables that are present. Then substitute the predicted value for the missing value. Advantages: More logical than other methods Full sample size preserved Disadvantages Overfits data: artificially increases model fit by assuming that similar units are identical. Lower standard errors.
19 4- Extra dummy variable for missing data Method: Add an extra dummy variable (coded 1 for the missing values and 0 otherwise to a series of dummy variables). Example: Education: university degree (ref), high school graduate (dummy), less than high school (dummy), unknown education (dummy) Advantages: - Full sample size preserved - Association between DV and missing data dummy is estimated
20 4- Extra dummy variable for missing data Disadvantages - Heterogeneity: missing data dummy possibly combined very diff vases together - Requires many extra dummy variables if you have missing data on multiple variables - Requires use of categorical/dummy vars not continuous variables
21 5- Multiple Imputation Method: Replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. These multiply imputed data sets are then analyzed using standard procedures for complete data and combine the results from these analyses. Advantages: Logical, full sample preserved By including random error, imputed data are more noisy than the observed data, therefore don t overfit as much as other methods. Disadvantages: Not necessarily available for all kinds of models. Not appropriate for missing on key independent variable or DV
22 6- Maximum Likelihood Estimation Method: EM algorithms estimate coefficients for model and standard errors with missing data. Advantages: Don t impute missing data. Best fitting parameters are selected via iterations that maximize the probability of observing the data that were collected Disadvantages Requires more statistical knowledge. Might require the use of different statistical programs. More common for SEM programs.
23 Summary First, you must understand why you have missing data and examine the patterns. Then you can choose a technique to deal with missing data. You may choose more than one. No matter how you deal with missing data, you should run your analysis various ways: With and without missing values included Using different methods to test whether it changes results Think about the direction in which missing values biases the results Before you start using one of these techniques, invest in understanding what assumptions you are making and how to do it with the software that you use.
24 General approaches The information presented here focused on general approaches for basic statistical analysis: OLS regression, logistic regression, ANOVA See literature on disciplinary and model-specific techniques and norms (multi-level models, structural equation modeling, factor analysis, panel data, etc.) Statistical Software: SPSS: limited in basic version. Some expensive upgrades available. Stata: multiple imputation, mi, ice, micombine SAS: MI MIANALYZE R: many options
25 References Allison, P.( 2002). Missing Data. Sage. A little green book. Johnson & Young (2011) Toward best practices in analyzing datasets with missing data: Comparisons and recommendations. Journal of Marriage and Family 73: Acock. (2005) Working with missing values. Journal of Marriage and Family 67: Raghunathan. (2004) What to do with missing data? Some options for analysis of incomplete data. Annual Review of Public Health 25: Little and Rubin. (1989). The analysis of Social Science Data with Missing Values. Sociological Methods and Research.
Introduction to Survey Data Analysis. Linda K. Owens, PhD. Assistant Director for Sampling & Analysis
Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis General information Please hold questions until the end of the presentation Slides available at www.srl.uic.edu/seminars/fall15seminars.htm
More information1. Understand & evaluate survey. What is survey data? When analyzing survey data... General information. Focus of the webinar
What is survey data? Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Data gathered from a sample of individuals Sample is random (drawn using probabilistic
More informationIntroduction to Survey Data Analysis. Focus of the Seminar. When analyzing survey data... Young Ik Cho, PhD. Survey Research Laboratory
Introduction to Survey Data Analysis Young Ik Cho, PhD Research Assistant Professor University of Illinois at Chicago Fall 2008 Focus of the Seminar Data Cleaning/Missing Data Sampling Bias Reduction When
More informationIntroduction to Survey Data Analysis
Introduction to Survey Data Analysis Young Cho at Chicago 1 The Circle of Research Process Theory Evaluation Real World Theory Hypotheses Test Hypotheses Data Collection Sample Operationalization/ Measurement
More informationPLAYING WITH HISTORY CAN AFFECT YOUR FUTURE: HOW HANDLING MISSING DATA CAN IMPACT PARAMATER ESTIMATION AND RISK MEASURE BY JONATHAN LEONARDELLI
PLAYING WITH HISTORY CAN AFFECT YOUR FUTURE: HOW HANDLING MISSING DATA CAN IMPACT PARAMATER ESTIMATION AND RISK MEASURE BY JONATHAN LEONARDELLI March 1, 2012 ABSTRACT Missing data is a common problem facing
More informationMULTIPLE IMPUTATION. Adrienne D. Woods Methods Hour Brown Bag April 14, 2017
MULTIPLE IMPUTATION Adrienne D. Woods Methods Hour Brown Bag April 14, 2017 A COLLECTIVIST APPROACH TO BEST PRACTICES As I began learning about MI last semester, I realized that there are a lot of guidelines
More informationSensitivity Analysis of Nonlinear Mixed-Effects Models for. Longitudinal Data That Are Incomplete
ABSTRACT Sensitivity Analysis of Nonlinear Mixed-Effects Models for Longitudinal Data That Are Incomplete Shelley A. Blozis, University of California, Davis, CA Appropriate applications of methods for
More informationUCLA Department of Statistics Papers
UCLA Department of Statistics Papers Title R&D, Attrition and Multiple Imputation in The Business Research and Development and Innovation Survey (BRDIS) Permalink https://escholarship.org/uc/item/1bx747j2
More informationStatistical Considerations
Version 1.3 Effective date: 21 May 2012 Author: Approved by: Dr Ranjit Lall, Research Fellow Statistician Dr Sarah Duggan, CTU Manager Revision Chronology: Effective Date Version 1.3 21 May 2012 Version
More informationMISSING DATA TREATMENTS AT THE SECOND LEVEL OF HIERARCHICAL LINEAR MODELS. Suzanne W. St. Clair, B.S., M.P.H. Dissertation Prepared for the Degree of
MISSING DATA TREATMENTS AT THE SECOND LEVEL OF HIERARCHICAL LINEAR MODELS Suzanne W. St. Clair, B.S., M.P.H. Dissertation Prepared for the Degree of DOCTOR OF PHILOSOPHY UNIVERSITY OF NORTH TEXAS August
More informationKristin Gustavson * and Ingrid Borren
Gustavson and Borren BMC Medical Research Methodology 2014, 14:133 RESEARCH ARTICLE Open Access Bias in the study of prediction of change: a Monte Carlo simulation study of the effects of selective attrition
More informationOVERVIEW OF APPROACHES FOR MISSING DATA. Susan Buchman Spring 2018
OVERVIEW OF APPROACHES FOR MISSING DATA Susan Buchman 36-726 Spring 2018 WHICH OF THESE PRODUCE MISSING DATA? A patient in a trial for a new drug dies before the study is over A patient in a trial for
More informationUsing Weights in the Analysis of Survey Data
Using Weights in the Analysis of Survey Data David R. Johnson Department of Sociology Population Research Institute The Pennsylvania State University November 2008 What is a Survey Weight? A value assigned
More informationDepartment of Sociology King s University College Sociology 302b: Section 570/571 Research Methodology in Empirical Sociology Winter 2006
Department of Sociology King s University College Sociology 302b: Section 570/571 Research Methodology in Empirical Sociology Winter 2006 Computer assignment #3 DUE Wednesday MARCH 29 th (in class) Regression
More informationChapter 3. Basic Statistical Concepts: II. Data Preparation and Screening. Overview. Data preparation. Data screening. Score reliability and validity
Chapter 3 Basic Statistical Concepts: II. Data Preparation and Screening To repeat what others have said, requires education; to challenge it, requires brains. Overview Mary Pettibone Poole Data preparation
More informationMissing data procedures for psychosocial research
Missing data procedures for psychosocial research Elizabeth Stuart Mental Health Summer Institute 330.616 Johns Hopkins Bloomberg School of Public Health Department of Mental Health Department of Biostatistics
More informationModeling Contextual Data in. Sharon L. Christ Departments of HDFS and Statistics Purdue University
Modeling Contextual Data in the Add Health Sharon L. Christ Departments of HDFS and Statistics Purdue University Talk Outline 1. Review of Add Health Sample Design 2. Modeling Add Health Data a. Multilevel
More informationAnalyzing non-normal data with categorical response variables
SESUG 2016 Paper SD-184 Analyzing non-normal data with categorical response variables Niloofar Ramezani, University of Northern Colorado; Ali Ramezani, Allameh Tabataba'i University Abstract In many applications,
More informationEstimation of multiple and interrelated dependence relationships
STRUCTURE EQUATION MODELING BASIC ASSUMPTIONS AND CONCEPTS: A NOVICES GUIDE Sunil Kumar 1 and Dr. Gitanjali Upadhaya 2 Research Scholar, Department of HRM & OB, School of Business Management & Studies,
More informationMethods for Multilevel Modeling and Design Effects. Sharon L. Christ Departments of HDFS and Statistics Purdue University
Methods for Multilevel Modeling and Design Effects Sharon L. Christ Departments of HDFS and Statistics Purdue University Talk Outline 1. Review of Add Health Sample Design 2. Modeling Add Health Data a.
More informationPROPENSITY SCORE MATCHING A PRACTICAL TUTORIAL
PROPENSITY SCORE MATCHING A PRACTICAL TUTORIAL Cody Chiuzan, PhD Biostatistics, Epidemiology and Research Design (BERD) Lecture March 19, 2018 1 Outline Experimental vs Non-Experimental Study WHEN and
More informationMultiple Imputation and Multiple Regression with SAS and IBM SPSS
Multiple Imputation and Multiple Regression with SAS and IBM SPSS See IntroQ Questionnaire for a description of the survey used to generate the data used here. *** Mult-Imput_M-Reg.sas ***; options pageno=min
More informationSecondary analysis of national survey datasetsjjns_213
bs_bs_banner 130..135 Japan Journal of Nursing Science (2013) 10, 130 135 doi:10.1111/j.1742-7924.2012.00213.x METHODOLOGICAL ARTICLE Secondary analysis of national survey datasetsjjns_213 Sunjoo BOO 1
More informationPotential sources of missing data in a meta-analysis. Missing data. Concepts in missing data. Concepts in missing data.
SMG advanced workshop, ardiff, March 4-5, 2010 Missing data Julian Higgins MR Biostatistics Unit ambridge, UK with thanks to Ian White, Fred Wolf, Angela Wood, Alex Sutton Potential sources of missing
More informationDealing with missing data in practice: Methods, applications, and implications for HIV cohort studies
Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies Belen Alejos Ferreras Centro Nacional de Epidemiología Instituto de Salud Carlos III 19 de Octubre
More informationApplication of Multiple Imputation in Dealing with Missing Data in Agricultural Surveys: The Case of BMP Adoption
Journal of Agricultural and Resource Economics 43(1):78 102 ISSN 1068-5502 Copyright 2018 Western Agricultural Economics Association Application of Multiple Imputation in Dealing with Missing Data in Agricultural
More informationMultilevel Modeling Tenko Raykov, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania
Multilevel Modeling Tenko Raykov, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania Multilevel Modeling Part 1 Introduction, Basic and Intermediate Modeling Issues Tenko Raykov Michigan
More informationPredictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN
Predictive Modeling using SAS Enterprise Miner and SAS/STAT : Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN 1 Overview This presentation will: Provide a brief introduction of how to set
More informationA Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values
Australian Journal of Basic and Applied Sciences, 6(7): 312-317, 2012 ISSN 1991-8178 A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values 1 K. Suresh
More informationLogistic Regression, Part III: Hypothesis Testing, Comparisons to OLS
Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals heavily
More informationChapter URL:
This PDF is a selection from an out-of-print volume from the National Bureau of Economic Research Volume Title: The Measurement of Labor Cost Volume Author/Editor: Jack E. Triplett, ed. Volume Publisher:
More informationCost-effectiveness and cost-utility analysis accompanying Cancer Clinical trials. NCIC CTG New Investigators Workshop
Cost-effectiveness and cost-utility analysis accompanying Cancer Clinical trials NCIC CTG New Investigators Workshop Keyue Ding, PhD. NCIC Clinical Trials Group Dept. of Public Health Sciences Queen s
More informationMissing data in software engineering
Chapter 1 Missing data in software engineering The goal of this chapter is to increase the awareness of missing data techniques among people performing studies in software engineering. Three primary reasons
More informationSOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis
SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis In any longitudinal analysis, we can distinguish between analyzing trends vs individual change that is, model
More informationWeb Appendix to Advertising Spillovers: Evidence from Online. Field-Experiments and Implications for Returns on Advertising
Web Appendix to Advertising Spillovers: Evidence from Online Field-Experiments and Implications for Returns on Advertising x A Estimation using a linear probability model In this section, I go one-by-one
More informationCh. 15 Data Preparation and Description
TECH 646 Analysis of Research in Industry and Technology PART IV Analysis and Presentation of Data: Data Presentation and Description; Exploring, Displaying, and Examining Data; Hypothesis Testing; Measures
More informationFinancing Constraints and Firm Inventory Investment: A Reexamination
Financing Constraints and Firm Inventory Investment: A Reexamination John D. Tsoukalas* Structural Economic Analysis Division Monetary Analysis Bank of England December 2004 Abstract This paper shows that
More informationThe Application of STATA s Multiple Imputation Techniques to Analyze a Design of Experiments with Multiple Responses
The Application of STATA s Multiple Imputation Techniques to Analyze a Design of Experiments with Multiple Responses STATA Conference - San Diego 2012 Clara Novoa, Ph.D., Bahram Aiabanpour, Ph.D., Suleima
More informationSawtooth Software. Sample Size Issues for Conjoint Analysis Studies RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.
Sawtooth Software RESEARCH PAPER SERIES Sample Size Issues for Conjoint Analysis Studies Bryan Orme, Sawtooth Software, Inc. 1998 Copyright 1998-2001, Sawtooth Software, Inc. 530 W. Fir St. Sequim, WA
More informationSawtooth Software. Learning Effects in Preference Tasks: Choice-Based Versus Standard Conjoint RESEARCH PAPER SERIES
Sawtooth Software RESEARCH PAPER SERIES Learning Effects in Preference Tasks: Choice-Based Versus Standard Conjoint Joel Huber, Duke University Dick R. Wittink, Cornell University Richard M. Johnson, Sawtooth
More informationTwo Way ANOVA. Turkheimer PSYC 771. Page 1 Two-Way ANOVA
Page 1 Two Way ANOVA Two way ANOVA is conceptually like multiple regression, in that we are trying to simulateously assess the effects of more than one X variable on Y. But just as in One Way ANOVA, the
More informationWhat is DSC 410/510? DSC 410/510 Multivariate Statistical Methods. What is Multivariate Analysis? Computing. Some Quotes.
What is DSC 410/510? DSC 410/510 Multivariate Statistical Methods Introduction Applications-oriented oriented introduction to multivariate statistical methods for MBAs and upper-level business undergraduates
More informationPartial Least Squares Structural Equation Modeling PLS-SEM
Partial Least Squares Structural Equation Modeling PLS-SEM New Edition Joe Hair Cleverdon Chair of Business Director, DBA Program Statistical Analysis Historical Perspectives Early 1900 s 1970 s = Basic
More informationLinear model to forecast sales from past data of Rossmann drug Store
Abstract Linear model to forecast sales from past data of Rossmann drug Store Group id: G3 Recent years, the explosive growth in data results in the need to develop new tools to process data into knowledge
More informationSTATISTICS PART Instructor: Dr. Samir Safi Name:
STATISTICS PART Instructor: Dr. Samir Safi Name: ID Number: Question #1: (20 Points) For each of the situations described below, state the sample(s) type the statistical technique that you believe is the
More informationGetting Started with HLM 5. For Windows
For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 About this Document... 3 1.2 Introduction to HLM... 3 1.3 Accessing HLM... 3 1.4 Getting Help with HLM... 3 Section 2: Accessing
More informationOptimal Method For Analysis Of Disconnected Diallel Tests. Bin Xiang and Bailian Li
Optimal Method For Analysis Of Disconnected Diallel Tests Bin Xiang and Bailian Li Department of Forestry, North Carolina State University, Raleigh, NC 27695-82 bxiang@unity.ncsu.edu ABSTRACT The unique
More informationSemester 2, 2015/2016
ECN 3202 APPLIED ECONOMETRICS 3. MULTIPLE REGRESSION B Mr. Sydney Armstrong Lecturer 1 The University of Guyana 1 Semester 2, 2015/2016 MODEL SPECIFICATION What happens if we omit a relevant variable?
More informationApplying Regression Techniques For Predictive Analytics Paviya George Chemparathy
Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy AGENDA 1. Introduction 2. Use Cases 3. Popular Algorithms 4. Typical Approach 5. Case Study 2016 SAPIENT GLOBAL MARKETS
More informationThe correct bibliographic citation for this manual is as follows: Shreve, Joni N. and Donna Dea Holland SAS Certification Prep Guide:
The correct bibliographic citation for this manual is as follows: Shreve, Joni N. and Donna Dea Holland. 2018. SAS Certification Prep Guide: Statistical Business Analysis Using SAS 9. Cary, NC: SAS Institute
More informationHan Du. Department of Psychology University of California, Los Angeles Los Angeles, CA
Han Du Department of Psychology University of California, Los Angeles Los Angeles, CA 90095-1563 Email: hdu@psych.ucla.edu EDUCATION Ph.D. in Quantitative Psychology 2018 University of Notre Dame M.S.
More informationResearch Methods in Human-Computer Interaction
Research Methods in Human-Computer Interaction Chapter 5- Surveys Introduction Surveys are a very commonly used research method Surveys are also often-maligned because they are not done in the proper manner
More informationArchives of Scientific Psychology Reporting Questionnaire for Manuscripts Describing Primary Data Collections
(Based on APA Journal Article Reporting Standards JARS Questionnaire) 1 Archives of Scientific Psychology Reporting Questionnaire for Manuscripts Describing Primary Data Collections JARS: ALL: These questions
More informationSummarizing categorical data involves boiling down all the information into just a few
Chapter 1 Summarizing Categorical Data: Counts and Percents In This Chapter Making tables to summarize categorical data Highlighting the difference between frequencies and relative frequencies Interpreting
More informationCan Microtargeting Improve Survey Sampling?
Can Microtargeting Improve Survey Sampling? An Assessment of Accuracy and Bias in Consumer File Marketing Data Josh Pasek University of Michigan jpasek@umich.edu NSF / Stanford Conference: Future of Survey
More informationTanja Srebotnjak, United Nations Statistics Division, New York, USA 1 Abstract
Multiple Imputations of Missing Data in the Environmental Sustainability Index - Pain or Gain? Tanja Srebotnjak, United Nations Statistics Division, New York, USA http://www.un.org/depts/unsd/ in cooperation
More informationMultilevel/ Mixed Effects Models: A Brief Overview
Multilevel/ Mixed Effects Models: A Brief Overview Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 27, 2018 These notes borrow very heavily, often/usually
More informationWorkshop II Project Management
Workshop II Project Management UNITAR-HIROSHIMA FELLOWSHIP FOR AFGHANISTAN 2007 Introduction to Project Management 15 17 August 2007, Dehradun, India Presented by: Jobaid Kabir, Ph.D. Fellowship Program
More informationImplementing Current Regulatory Guidance: An Industry Perspective
Implementing Current Regulatory Guidance: An Industry Perspective European Statistical Meeting: Advances in the Treatment of Missing Data November 18, 2011 Brussels Mouna Akacha, Novartis Pharma AG Basel
More informationAnalysis of Factors that Affect Productivity of Enterprise Software Projects
Analysis of Factors that Affect Productivity of Enterprise Software Projects Tsuneo Furuyama Scool of Science, Tokai University Kitakaname 4-1-1, Hiratsuka city, Japan furuyama@tokai-u.jp Abstract Analysis
More informationEstoril Education Day
Estoril Education Day -Experimental design in Proteomics October 23rd, 2010 Peter James Note Taking All the Powerpoint slides from the Talks are available for download from: http://www.immun.lth.se/education/
More informationMachine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University
Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics
More informationSalford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.
Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable
More informationMachine-learning models for predicting drug approvals and clinical-phase transitions
Machine-learning models for predicting drug approvals and clinical-phase transitions Andrew W. Lo 1,2,3 *, Kien Wei Siah 1,2, Chi Heem Wong 1,2 1 Laboratory for Financial Engineering, Sloan School of Management,
More informationWORK INTENSIFICATION, DISCRETION, AND THE DECLINE IN WELL-BEING AT WORK.
WORK INTENSIFICATION, DISCRETION, AND THE DECLINE IN WELL-BEING AT WORK. INTRODUCTION Francis Green University of Kent Previous studies have established that work intensification was an important feature
More informationData Integration (stat08014)
Data Integration (stat08014) Luciana Dalla Valle, University of Plymouth, UK Abstract This article introduces some of the most popular techniques of data integration, that allow the combination of information
More informationHierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology
Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology As noted previously, Hierarchical Linear Modeling (HLM) can be considered a particular instance
More informationSurvival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification
Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Final Project Report Alexander Herrmann Advised by Dr. Andrew Gentles December
More informationEnsemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo
Ensemble Modeling Toronto Data Mining Forum November 2017 Helen Ngo Agenda Introductions Why Ensemble Models? Simple & Complex ensembles Thoughts: Post-real-life Experimentation Downsides of Ensembles
More informationRemarkable Team Building (Part 1 of 4) e = mc 2 Your practice is simply the reflection of your energy your energy as a doctor your energy as a leader your Teams energy your Tribes energy. Your practice
More informationSCHOOL OF AGRICULTURE
SCHOOL OF AGRICULTURE DEPARTMENT OF AGRICULTURAL ECONOMICS & AGRIBUSINESS 1. PhD in Applied Agricultural Economics and Policy 2. PhD in Agricultural Administration 3. PhD in Agribusiness PHD IN APPLIED
More informationPSC 508. Jim Battista. Dummies. Univ. at Buffalo, SUNY. Jim Battista PSC 508
PSC 508 Jim Battista Univ. at Buffalo, SUNY Dummies Dummy variables Sometimes we want to include categorical variables in our models Numerical variables that don t necessarily have any inherent order and
More informationMasters in Business Statistics (MBS) /2015. Department of Mathematics Faculty of Engineering University of Moratuwa Moratuwa. Web:
Masters in Business Statistics (MBS) - 2014/2015 Department of Mathematics Faculty of Engineering University of Moratuwa Moratuwa Web: www.mrt.ac.lk Course Coordinator: Prof. T S G Peiris Prof. in Applied
More informationExamination of Cross Validation techniques and the biases they reduce.
Examination of Cross Validation techniques and the biases they reduce. Dr. Jon Starkweather, Research and Statistical Support consultant. The current article continues from last month s brief examples
More informationADVANCED DATA ANALYTICS
ADVANCED DATA ANALYTICS MBB essay by Marcel Suszka 17 AUGUSTUS 2018 PROJECTSONE De Corridor 12L 3621 ZB Breukelen MBB Essay Advanced Data Analytics Outline This essay is about a statistical research for
More informationAdvice to Health Services Researchers: Be Cautious Using the Where Statement in SAS Programs for Nationally Representative Complex Survey Data
Advice to Health Services Researchers: Be Cautious Using the Where Statement in SAS Programs for Nationally Representative Complex Survey Data Hemalkumar B. Mehta, Michael L. Johnson Department of Clinical
More informationUsing R for Introductory Statistics
R http://www.r-project.org Using R for Introductory Statistics John Verzani CUNY/the College of Staten Island January 6, 2009 http://www.math.csi.cuny.edu/verzani/r/ams-maa-jan-09.pdf John Verzani (CSI)
More informationWeek 11: Collinearity
Week 11: Collinearity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Regression and holding other
More informationCh. 15 Data Preparation and Description
TECH 646 Analysis of Research in Industry and Technology PART IV Analysis and Presentation of Data: Data Presentation and Description; Exploring, Displaying, and Examining Data; Hypothesis Testing; Measures
More informationResearch Article One-Step Dynamic Classifier Ensemble Model for Customer Value Segmentation with Missing Values
Mathematical Problems in Engineering, Article ID 869628, 15 pages http://dx.doi.org/10.1155/2014/869628 Research Article One-Step Dynamic Classifier Ensemble Model for Customer Value Segmentation with
More informationEstimating Discrete Choice Models of Demand. Data
Estimating Discrete Choice Models of Demand Aggregate (market) aggregate (market-level) quantity prices + characteristics (+advertising) distribution of demographics (optional) sample from distribution
More informationWhat we can do about human error
What we can do about human error Petroleum Safety Conference May 2016 12 May 2016 1 Decision making All relevant information Best methods + for interpreting = information Perfect decisions 2 Situational
More informationCode Compulsory Module Credits Continuous Assignment
CURRICULUM AND SCHEME OF EVALUATION Compulsory Modules Evaluation (%) Code Compulsory Module Credits Continuous Assignment Final Exam MA 5210 Probability and Statistics 3 40±10 60 10 MA 5202 Statistical
More informationGasoline Consumption Analysis
Gasoline Consumption Analysis One of the most basic topics in economics is the supply/demand curve. Simply put, the supply offered for sale of a commodity is directly related to its price, while the demand
More information01 University of Plymouth Research Outputs University of Plymouth Research Outputs
University of Plymouth PEARL https://pearl.plymouth.ac.uk 01 University of Plymouth Research Outputs University of Plymouth Research Outputs 2017-12-20 Data Integration Dalla Valle, L http://hdl.handle.net/10026.1/9294
More informationNot Just Another Pretty Formula: Practical Methods for Mitigating Self-Selection Bias in Billing Analysis Regressions
Not Just Another Pretty Formula: Practical Methods for Mitigating Self-Selection Bias in Billing Analysis Regressions ABSTRACT Dr. Miriam L. Goldberg and G. Kennedy Agnew, DNV GL, Madison, WI Dr. Meredith
More informationIf you are using a survey: who will participate in your survey? Why did you decide on that? Explain
Journal 11/13/18 If you are using a survey: who will participate in your survey? Why did you decide on that? Explain If you are not using a survey: Where will you look for information? Why did you decide
More informationModern Genetic Evaluation Procedures Why BLUP?
Modern Genetic Evaluation Procedures Why BLUP? Hans-Ulrich Graser 1 Introduction The developments of modem genetic evaluation procedures have been mainly driven by scientists working with the dairy populations
More informationWhat Is Conjoint Analysis? DSC 410/510 Multivariate Statistical Methods. How Is Conjoint Analysis Done? Empirical Example
What Is Conjoint Analysis? DSC 410/510 Multivariate Statistical Methods Conjoint Analysis 1 A technique for understanding how respondents develop preferences for products or services Also known as trade-off
More informationDisaggregating the Return on Investment to IT Capital
Association for Information Systems AIS Electronic Library (AISeL) ICIS 1998 Proceedings International Conference on Information Systems (ICIS) December 1998 Disaggregating the Return on Investment to
More informationMidterm Test Department: Computer Science Instructor: Steve Easterbrook Date and Time: 10:10am, Thursday 1st March, 2012
CSC302 Engineering Large Software Systems page /9 Faculty of Arts and Science University of Toronto Midterm Test Department: Computer Science Instructor: Steve Easterbrook Date and Time: 0:0am, Thursday
More informationStandard for applying the Principle. Involving Stakeholders DRAFT.
V V Standard for applying the Principle Involving Stakeholders DRAFT www.socialvalueint.org Table of Contents Introduction...1 Identifying stakeholders...4 Stakeholder involvement...5 Deciding how many
More informationSUGI 29 Statistics and Data Analysis. Paper
Paper 206-29 Using SAS Procedures to Make Sense of a Complex Food Store Survey Jeff Gossett, University of Arkansas for Medical Sciences, Little Rock, AR Pippa Simpson, University of Arkansas for Medical
More informationPREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING
PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING Abbas Heiat, College of Business, Montana State University, Billings, MT 59102, aheiat@msubillings.edu ABSTRACT The purpose of this study is to investigate
More informationHandbook On Impact Evaluation With Stata. Examples >>>CLICK HERE<<<
Handbook On Impact Evaluation With Stata Examples This page highlights books for designing an impact evaluation, animations, power Handbook on Impact Evaluation, Khandker, S. R., Koolwal, G. B., & Samad,
More informationGLMs the Good, the Bad, and the Ugly Ratemaking and Product Management Seminar March Christopher Cooksey, FCAS, MAAA EagleEye Analytics
Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to
More informationChapter 3. Database and Research Methodology
Chapter 3 Database and Research Methodology In research, the research plan needs to be cautiously designed to yield results that are as objective as realistic. It is the main part of a grant application
More informationPharmaSUG 2016 Paper 36
PharmaSUG 2016 Paper 36 What's the Case? Applying Different Methods of Conducting Retrospective Case/Control Experiments in Pharmacy Analytics Aran Canes, Cigna, Bloomfield, CT ABSTRACT Retrospective Case/Control
More informationPractical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System
r""'=~~"''''''''''''''''''''''''''''\;'=="'~''''o''''"'"''~ ~c_,,..! Practical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System Rainer Muche 1, Josef HogeP and Olaf
More information