Multilevel Modeling Tenko Raykov, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania

Similar documents
Appendix A Mixed-Effects Models 1. LONGITUDINAL HIERARCHICAL LINEAR MODELS

RESIDUAL FILES IN HLM OVERVIEW

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology

Methods for Multilevel Modeling and Design Effects. Sharon L. Christ Departments of HDFS and Statistics Purdue University

Multilevel Modeling and Cross-Cultural Research

Modeling Contextual Data in. Sharon L. Christ Departments of HDFS and Statistics Purdue University

Examining Turnover in Open Source Software Projects Using Logistic Hierarchical Linear Modeling Approach

Getting Started with HLM 5. For Windows

Running head: THE MEANING AND DOING OF MINDFULNESS

How to Get More Value from Your Survey Data

Multilevel Mixed-Effects Generalized Linear Models. Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria

STATISTICS PART Instructor: Dr. Samir Safi Name:

A Smart Approach to Analyzing Smart Meter Data

Which is the best way to measure job performance: Self-perceptions or official supervisor evaluations?

Dealing with Missing Data: Strategies for Beginners to Data Analysis

Chapter 11. Multiple-Sample SEM. Overview. Rationale of multiple-sample SEM. Multiple-sample path analysis. Multiple-sample CFA.

An Introduction to Latent Growth Curve Modeling (LGC Modeling): A Resource Packet for Participants

Module 15: Multilevel Modelling of Repeated Measures Data. Stata Practical

Package WABA. February 16, 2018

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling

Introduction to Business Research 3

Multiple Regression: A Primer

Empirical Exercise Handout

emergent group-level effects U.S. Army Medical Research Unit-Europe Walter Reed Army Institute of Research

5 CHAPTER: DATA COLLECTION AND ANALYSIS

Introduction to Hierarchical Linear Models ICPSR 2009

Chapter 3. Table of Contents. Introduction. Empirical Methods for Demand Analysis

Introduction to Research

Hierarchical Linear Models For Longitudinal Data August 6-8, 2012

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference

SC 705: Advanced Statistics Spring 2010

THREE LEVEL HIERARCHICAL BAYESIAN ESTIMATION IN CONJOINT PROCESS

Timing Production Runs

Archives of Scientific Psychology Reporting Questionnaire for Manuscripts Describing Primary Data Collections

Time Series Analysis in the Social Sciences

MEANS AND ENDS: A COMPARATIVE STUDY OF EMPIRICAL METHODS FOR INVESTIGATING GOVERNANCE AND PERFORMANCE

PROPENSITY SCORE MATCHING A PRACTICAL TUTORIAL

Examination of Cross Validation techniques and the biases they reduce.

Do Customers Respond to Real-Time Usage Feedback? Evidence from Singapore

Econ 792. Labor Economics. Lecture 6

Kristin Gustavson * and Ingrid Borren

Acadience Reading Technical Adequacy Brief

Partial Least Squares Structural Equation Modeling PLS-SEM

UNIVERSITY OF MORATUWA

WRITTEN PRELIMINARY Ph.D. EXAMINATION. Department of Applied Economics. January 27, Consumer Behavior and Household Economics.

Quantitative Methods in the Applied Behavioral Sciences. RSM 3090 (Fall, 2014) Thursdays 1-4pm, Rotman South Building 6024 (95 St.

Multilevel/ Mixed Effects Models: A Brief Overview

Assessing the performance of food co ops in the US

Using Factor Analysis to Generate Clusters of Agile Practices

THE Q-SORT METHOD: ASSESSING RELIABILITY AND CONSTRUCT VALIDITY OF QUESTIONNAIRE ITEMS AT A PRE-TESTING STAGE

PVM Pharmacy Customer Satisfaction Structural Equations Models. Multivariate Solutions

Discussion: Labor Supply Substitution and the Ripple Effect of Minimum Wages by Brian J. Phelan

MEASURING PUBLIC SATISFACTION FOR GOVERNMENT PROCESS REENGINEERING

The Kruskal-Wallis Test with Excel In 3 Simple Steps. Kilem L. Gwet, Ph.D.

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17

Contact: Version: 2.0 Date: March 2018

Stats Happening May 2016

Correlation and Simple. Linear Regression. Scenario. Defining Correlation

ESTIMATING TOTAL-TEST SCORES FROM PARTIAL SCORES IN A MATRIX SAMPLING DESIGN JANE SACHAR. The Rand Corporatlon

Researchjournali s Journal of Mathematics

UNIVERSITY OF STELLENBOSCH ECONOMICS DEPARTMENT ECONOMICS 388: 2018 INTRODUCTORY ECONOMETRICS

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

ESS ESSNET DATA STATISTICS

PHD THESIS - summary

Secondary Math Margin of Error

Statistics & Analysis. Confirmatory Factor Analysis and Structural Equation Modeling of Noncognitive Assessments using PROC CALIS

The Relationship between Social Networks and Micro Shareholders 1

Practical Regression: Fixed Effects Models

Develop Innovative Methods in Secondary Analyses of Child Welfare Databases -- Children s Bureau Discretionary Grants Program Grantee s Final Report

An Application of Categorical Analysis of Variance in Nested Arrangements

Chapter 3. Basic Statistical Concepts: II. Data Preparation and Screening. Overview. Data preparation. Data screening. Score reliability and validity

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS

Conjoint analysis based on Thurstone judgement comparison model in the optimization of banking products

Examples of Statistical Methods at CMMI Levels 4 and 5

TECHNICAL EFFICIENCY OF SHRIMP FARMERS IN BANGLADESH: A STOCHASTIC FRONTIER PRODUCTION FUNCTION ANALYSIS

Software Metrics. Practical Approach. A Rigorous and. Norman Fenton. James Bieman THIRD EDITION. CRC Press CHAPMAN & HALIVCRC INNOVATIONS IN

Center for Demography and Ecology

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Band Battery

Econometrics is: The estimation of relationships suggested by economic theory

Estimation of multiple and interrelated dependence relationships

Technical Adequacy of the easycbm Primary-Level Mathematics

The Big Picture: Four Domains of HLM. Today s Discussion. 3 Background Concepts. 1) Nested Data. Introduction to Hierarchical Linear Modeling (HLM)

INTRODUCTION BACKGROUND. Paper

7 Statistical characteristics of the test

EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS

CUSTOMER SATISFACTION WITH MOBILE OPERATORS SERVICES IN LITHUANIAN RURAL AREAS

Is There an Environmental Kuznets Curve: Empirical Evidence in a Cross-section Country Data

Composite Performance Measure Evaluation Guidance. April 8, 2013

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis

Studies in Local Public Transport Demand. Johan Holmgren

THE ROLE OF NESTED DATA ON GDP FORECASTING ACCURACY USING FACTOR MODELS *

Vital Attributes of Information Technology Service Consultants in Incident Management

Distinguish between different types of numerical data and different data collection processes.

Comparative Study of Risk Assessment Value against Risk Priority Number

Methodology: Assessment and Cross-Validation

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

What is Multilevel Structural Equation Modelling?

Testing the Erdem and Swait Brand Equity Framework Using Latent Class Structural Equation Modelling

Discriminant Analysis Applications and Software Support

The Effects of Employee Ownership in the Context of the Large Multinational : an Attitudinal Cross-Cultural Approach. PhD Thesis Summary

Transcription:

Multilevel Modeling Tenko Raykov, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania

Multilevel Modeling Part 1 Introduction, Basic and Intermediate Modeling Issues Tenko Raykov Michigan State University (www.msu.edu/~raykov) Copyright Tenko Raykov, 2016

Plan: 0. Resources for this course and what it is about. 1. Why do we need multilevel modeling (MLM), and how come aggregation and disaggregation do not do the job? 2. The beginnings of MLM Why what we already know about regression analysis is so useful. 3. The intra-class correlation coefficient The basics of a multilevel model. 4. Proportions of variance at third level and at intermediate level in three-level settings, and how to evaluate them. 5. The random intercept model and model adequacy assessment. 6. Robust (ordinary least squares-based) modeling of lowerlevel variable relationships in presence of clustering effect. Appendices: - terminology in multilevel modeling (glossary, notation); - reshaping data (if needed) for modeling with Stata.

0. Resources for this course and what it is about Literature: Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling with Stata (3 rd Edition). College Station, TX: Stata Press. Raudenbush, S., & Bryk, A. (2002). Hierarchical linear and nonlinear modeling (2 nd Edition). Thousand Oaks, CA: Sage. Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent linear and mixed models. Boca Raton, FL: Chapman & Hall. Snijders, T. A. B., & Bosker, R. (2012). Multilevel models. An introduction to basic and advanced modeling (2 nd Edition). Thousand Oaks, CA: Sage. Software: - Stata. Further excellent software and voluminous literature on multilevel modeling/hierarchical linear and nonlinear modeling are available as well (see also papers sent along with data files to participants). Goals of course: It is application oriented but with coherent discussion of theoretical issues at an introductory to intermediate level, with a few more advanced issues. Disclaimer: Data sets are used only for method illustration purposes.

Note. This course provides connections to the following main applied statistics areas (methodologies; see alternative/possible short courses). Structural Equation Modeling Longitudinal Data Analysis Multilevel Modeling (this course) Missing Data Analysis Survey Data Analysis Measurement (Scale Construction and Development)

What this course is about: It is about the applied statistical modeling methodology usually referred to as: - Hierarchical (linear and nonlinear) modeling, - Random effects modeling, - Random coefficient modeling, - Mixed effects modeling, - Mixed modeling, - Generalized linear mixed modeling, - Variance component models. These are largely synonyms for multilevel modeling. Closely related are these fields (effectively synonymous here): - Longitudinal modeling, - Panel data modeling, - Repeated measure analysis, - Cross-sectional time series analysis. The common theme underlying the above and this course is: Regression modeling when data are clustered (nested, hierarchical) in some way. Standard applied statistical modeling e.g., regression analysis (OLS) does not handle this case (as it makes the assumption of independence). Clustered data sets are very rich since they contain information about processes at different levels (e.g., personal and institutional characteristics) as well as, by implication, about their interactions. Multilevel modeling allows one to (1) disentangle them, (2) study them simultaneously, and (3) examine their interactions, all this (4) permitting answering complex questions about studied phenomena.

1. Why do we need multilevel modeling, and how come aggregation and disaggregation do not do the job? Data from empirical studies in the social, behavioral, and biomedical sciences as well as in business, marketing, and economics very often exhibit distinct hierarchical (multi-level) structure. The reason is that studied individuals are (already) grouped into larger units e.g., firms, companies, industries; neighborhoods, cities, states, countries; centers, physicians, hospitals; families, dyads, twins pairs, etc. This nesting or clustering may have an effect on the subjects scores on measures used, in particular on outcome scores, which thus need not be uncorrelated. The units of analysis on which the dependent variable is measured are usually referred to as level-1 units. The higher-order units in which they are nested (firms, companies, cities managers) are called level-2 units, also often referred to as groups for simplicity (as we will do not infrequently below). In some studies, depending on the research question, it is possible that other measurements are considered level-1 units (e.g., subsection 1.2 below). In general, if the level-2 units are in turn nested themselves within even higher-order units, the latter are called level-3 units, and so on (e.g., Section 4). Consider the following illustrations. 1.1. Examples of nested data and the hallmark of MLM 1. (Business, Industrial/Organizational Psychology) Employees are nested (clustered) within companies: former are level-1 units, the latter are level-2 units. 2. (Economics/Sociology) Individuals nested (clustered) within cities: former - level-1 units, latter - level-2 units; also,

interviewees (level-1 units) are nested w/in interviewers (level-2 units). 3. (Management) Workers are nested within managers; employees are nested within teams. The list of examples can go on, and they all share this common feature: the units in a lower level of the hierarchy are nested (grouped, clustered, similar, correlated, inter-related) within the units of its higher level(s). Why is the issue of nesting, or clustering, of subjects relevant? Because nesting implies a possibly serious lack of independence of individual scores on the dependent variable(s) of concern (abbreviated to DV and denoted Y in this course; we refer to the DV as a response or outcome variable in it). The reason is that the Y scores of subjects within the level-2 units are in general correlated (more similar), unlike the Y scores of subjects from different level-2 (or highest level) units. This nesting (clustering, hierarchy, similarity, correlation) is the hallmark of multilevel modeling. (I will use the abbreviation MLM for multilevel/hierarchical modeling, the subject of this course, rather than the abbreviation HLM that is reserved for a popular software program for MLM.) To examine in more detail this feature of MLM, let s look at the scores on say a job satisfaction (JS) measure in a given industry. Here, to properly study their properties, we need to keep in mind that employees are nested within firms (companies).

Suppose that we (i) randomly sample 30 firms/companies from the industry in question and then (ii) randomly sample 50 say employees from within each firm (who have been administered the JS measure). Then, due to the fact that employees in the same firm share the same working and related conditions (as well as possibly same manager/s), and a host of other experiences of various kinds, their Y scores will tend to be correlated ( similar ; see next). Let s look at the following illustration (to clarify this important point). Employee (ID1) Firm ID (ID2) JS Score (Y) 1 Company 1 45 2 Company 1 46 3 Company 1 44 4 Company 1 42 5 Company 2 79 6 Company 2 78 7 Company 2 77 8 Company 2 75 9 Company 3 92 10 Company 3 91 11 Company 3 93 12 Company 3 94 etc. Here, even though firms/companies and employees within them are randomly chosen, the individual worker s JS scores within company seem to be relatively similar ( correlated ). Specifically, (any) two JS scores within firm are more similar than (any) two across (between) firms.

This within-firm similarity is unlike any such for scores across companies (which scores in general might be so too, though, but often to a lower extent). This property, often referred to as withincluster or within-group correlation, is a fundamental feature that will go like a red thread throughout this course. All standard statistical methods (e.g., the general and generalized linear models, classical SEM) do not account for this lack of independence in the response variable scores across subjects. In particular, the standard (traditional, conventional) methods like regression analysis, ANOVA, multivariate statistics, SEM, analysis of categorical data, etc. do assume that these subjects scores are independent. For this reason, they are called single-level methods. Hence, to the degree to which this assumption is violated, the results of an application of those classical/standard methods on hierarchical data of the kind we have been discussing so far, will yield less trustworthy if not even misleading results. A rather frequent consequence of a serious violation of the above independence assumption is the phenomenon of spurious significance, if this violation is neglected (see also Rabe-Hesketh & Skrondal, 2012, for alternative examples). This phenomenon follows from: (i) ignoring the nesting (clustering) of subjects or the lowest units in the analyzed data, (ii) proceeding then with the above mentioned conventional methods (single-level models/methods), which tends to yield too many rejections of pertinent hypotheses, (iii) usually spuriously small standard errors, which also lead to too short confidence intervals, and too small p- values associated with statistical tests.