Respondent Driven Sampling: Introduction and Applications

Similar documents
Two Applications of Respondent Driven Sampling: Ethnic Minorities and Illicit Substance Users

Understanding Respondent Driven Sampling from a Total Survey Error Perspective

From theoretical to real-world networks: Using Empirical Samples of Female Sex Workers in China to Evaluate Respondent Driven Sampling 1

Contents. Gender pay gap statement 2017/18. Gender pay gap report for snapshot date 5th April Gender pay gap statement 2017/18

Introduction to Sample Surveys

Contact: Version: 2.0 Date: March 2018

Sample Survey and Sampling Methods

Exploring the Effectiveness of Chain Referral Methods in Sampling Hidden Populations

Chapter Eleven. Sampling Foundations

Who gets the job referral? Evidence from a social networks experiment. Lori Beaman and Jeremy Magruder Northwestern and UC Berkeley

LIST OF METADATA ITEMS FOR BUSINESS TENDENCY AND CONSUMER OPINION SURVEYS SOUTH AFRICA, BER OCTOBER 2005

DATA MEMO. The average American internet user is not sure what podcasting is, what an RSS feed does, or what the term phishing means

Using Weights in the Analysis of Survey Data

Understanding People. Sample Matching

1 PEW RESEARCH CENTER

Mixed mode and web data collection

Sample: n=2,252 people age 16 or older nationwide, including 1,125 cell phone interviews Interviewing dates:

Bias Variance and Breadth Depth Tradeoffs in Respondent-Driven Sampling

Regional Workshop on the Production of Statistics on Asset Ownership from a Gender Perspective through Household Surveys

Outline. Quality of Data Linkage Megan Bohensky August Background Quality in Data Linkage Study Example Reporting Standards.

Estimating household and individual level impacts: the experience of Kenya

SAMPLING The Local Landscape Fall 2014

CHIS 2013 Sample Design and Survey Methodology TAC. August 30, 2012

Engineering Statistics ECIV 2305 Chapter 8 Inferences on a Population Mean. Section 8.1. Confidence Intervals

Widening the Net. A comparison of online intercept and access panel sampling

Agency Information Collection Activities: Request for Comments for Periodic Information Collection

Measuring the Correlates of Intent to Participate and Participation in the Census and Trends in These Correlates:

10/16/ Christians 20 Muslims 20 Buddhists 20 Atheists 20 Jews. 60 Christians 15 Muslims 10 Buddhists 8 Atheists 7 Jews

Modeling Contextual Data in. Sharon L. Christ Departments of HDFS and Statistics Purdue University

Minimizing Interviewer Effects: Innovations in Quality Assurance and Control in Multicultural, Multinational, and Multiregional Surveys

A Survey on Survey Statistics: What is done, can be done in Stata, and what s missing?

INTERNAL PILOT DESIGNS FOR CLUSTER SAMPLES

Introduction Social Network Analysis Presented by Kimberly A. Fredericks, Ph.D

Is the Public Willing to Pay to Help Fix Climate Change?

Getting the Bead of Hardship

Introduction to Part I

Thoughts on Innovative Approaches for Sampling Special Populations. Robert Santos Urban Institute AAPOR President

Propensity score matching with clustered data in Stata

WIOA: POLICY & PRACTICE

Section 7-3 Estimating a Population Mean: σ Known

An Internet tax for Canadian Content? National Online Omnibus Survey

Chapter 4: Foundations for inference. OpenIntro Statistics, 2nd Edition

Population Abundance Indices. LO: evaluate the ability to estimate population abundances

Thailand 2003 agriculture census PREFACE

Future of Clipper Public Input Executive Summary May 22, 2015

ThinkNow Retail 2017 Report Innovations in technology expected to ignite a boon for retailers this holiday season

Wasserman, S. (1980). Analyzing Social Networks as Stochastic Processes. Journal of

1 PEW RESEARCH CENTER

Benin Indicator Survey Data Set

Methods for Multilevel Modeling and Design Effects. Sharon L. Christ Departments of HDFS and Statistics Purdue University

The Essential Report - Queensland State Election

WIOA Guidance Notice No. 7-16

DATA COLLECTION & SAMPLING

Prepared by: Chintan Turakhia, Jonathan Best, and Jennifer Su 1 Braxton Way Suite 125 Glen Mills, PA 19342

SUPPLEMENTARY INFORMATION

D.W. Pennay, D. Neiger, P.J. Lavrakas and K. Borg

Subject Should NOT Ask May Ask

Advice to Health Services Researchers: Be Cautious Using the Where Statement in SAS Programs for Nationally Representative Complex Survey Data

Peer Effects, Free-Riding and Team Diversity

UnderStandingAmericaStudy

1 PEW RESEARCH CENTER

Previous chapters have covered general issues in sampling. In addition to

THE WORK-LIFE IMBALANCE REPORT PRESENTED BY WORKFRONT // APRIL 2015

Annex 1: Productivity of Non-farm Enterprises

MICHIGAN CONSUMER MEDIA RESEARCH STUDY. Scarborough 2013

ADDING A GENDER LENS TO JOB TRAINING: WHAT WORKS TO ENSURE WOMEN'S INCLUSION AND SUCCESS IN MALE-DOMINATED JOBS

Re-Wiring the Labor Market to Reboot the American Dream

SHAPE Methodology. May 6, 2011 HSPHD Assessment Team

UNIVERSITY OF ILLINOIS GUIDELINES FOR PREEMPLOYMENT INQUIRIES

WISC V Construct Validity: Hierarchical EFA with a Large Clinical Sample

Lotteries Yukon s 2013 Household Survey and Web Survey Summary of Results

Princeton Survey Research Associates International for The National Hydropower Association. Final Topline Results January 13, 2014

ASSESSMENT OF CAPACITY BUILDING NEEDS OF NGOs IN GEORGIA

Labor Economics. Evidence on Efficiency Wages. Sébastien Roux. ENSAE-Labor Economics. April 4th 2014

Saville Consulting Wave Professional Styles Handbook

Michelle Wang Department of Biology, Queen s University, Kingston, Ontario Biology 206 (2008)

The Essential Report. COBA questions - 22 May 2018 ESSENTIALMEDIA.COM.AU

How much flexibility do we need?

Third Thursday Crowell & Moring s Labor & Employment Update February 19, 2015

Summative Farm Profile Report Recruitment and Retention Practices used in Primary Agriculture

NUMBERS, FACTS AND TRENDS SHAPING THE WORLD FOR RELEASE DECEMBER 8, 2014 FOR FURTHER INFORMATION ON THIS REPORT:

Belize 2010 Enterprise Surveys Data Set

Short Course: Adaptive Clinical Trials

provided that the population is at least 10 times as large as the sample (10% condition).

STATE OF AUSTRALIAN UNIVERSITY RESEARCH : VOLUME 2 / 26

Population parameters, unlike RVs, are fixed, usually unknown, characteristics of the population.

COMBINING ABILITY STUDIES IN NEWLY ESTABUSHED 'MS' AND 'R' UNES IN RABI SORGHUM

RESEARCH SERVICE ver2

Disclosing Multiple Product Attributes. Monic Sun Stanford University

Workplace Health Solutions. Workplace Health Achievement Index User Guide. Workplace Health Solutions: 1/24/17

MANAGING FOR BOBWHITE QUAIL IN THE SOUTHEAST LOUISIANA AGRICULTURAL CONSULTANTS FEBRUARY 9, 2007

Gender Pay Gap Report 31 March 2017 snapshot

SUPPORT STAFF APPLICATION FORM

Saskatchewan Immigrant Nominee Program (SINP) Hospitality Project Sub-Category Recruitment and Settlement Plan

Taekjin Shin. Taekjin Shin

Supervisor and organizational factors associated with supervisors' support for work accommodation in low back pain injured workers

Some Thoughts on Combining Samples of the General Social Survey (GSS) and the American National Election Studies (ANES) 1.

DESIGN EFFECTS OF LINKED POPULATION/ESTABLISHMENT SURVEYS. = M = number of transactions establishment

CLOSING THE DIGITAL LOOPHOLES THAT PAVE THE WAY FOR FOREIGN INTERFERENCE IN U.S. ELECTIONS

Measuring Consumer Cash Holdings: Lessons from the 2013 Bank of Canada Methods-of- Payment Survey

Transcription:

Respondent Driven Sampling: Introduction and Applications Sunghee Lee University of Michigan Federal Committee on Statistical Methodology Research and Policy Conference March 7, 2018 FCSM Conference 1

Outline Introduction Application Health and Life Study of Koreans (HLSK) Summary FCSM Conference 2

Introduction Respondent Driven Sampling (RDS) Network Sampling vs. RDS RDS Inferences FCSM Conference 3

Respondent Driven Sampling 1 Growing interest in studying hard-to-reach, rare, elusive, hidden populations HIV at-risk population: Sex workers, IDUs, MSMs LGBT populations Recent immigrants No clear and practical solution with probability sampling High screening costs Hesitant to be identified FCSM Conference 4

Respondent Driven Sampling 2 Proposed by Heckathorn (1997, 2002) Popular usage in public health (~$100 million research funds by NIH as of 2011) Exploits social networks among rare population members for sampling purposes Sampled members also play a role of a recruiter Incentivized recruitment from own network through coupons and this continues in waves/chains Recruitment assumed to be random within each individual s network and to follow memory-less Markov chain and reach equilibrium FCSM Conference 5

.. Respondent Driven Sampling 3 WAVE 1 WAVE 2 WAVE 3 Recruitment Coupon Recruit 1 Seed 1 Recruit 1 Recruit 2 Recruit 2 Recruit 3.. WAVE W Recruit 1 Recruit 2 Recruit 3 Recruit 3 Seed 2 Seed 3 Seed S Recruit R -2 Recruit R -1 Recruit R FCSM Conference 6

.. Respondent Driven Sampling 4 WAVE 1 WAVE 2 WAVE 3 Recruitment Coupon Recruit 1 Seed 1 Recruit 1 Recruit 2 Recruit 2 Recruit 3.. WAVE W Recruit 1 Recruit 2 Recruit 3 Recruit 3 Seed 2 Seed 3 Seed 1 Recruitment Chain Recruit R -2 Recruit R -1 Seed S Recruit R FCSM Conference 7

Network/Multiplicity Sampling Sirken (1972, 1975) Sample from a sample s network Conduct an interview with a sample Roster eligible kinship members with contact information Sample from the roster FCSM Conference 8

Network Sampling vs. RDS Similar: Rely on social networks Different: Network specification NS: biological siblings, immediate family members RDS: jazz musicians Who selects the sample NS: researchers RDS: study participants with coupon Selection probability NS: Known RDS: (Mostly) Unknown FCSM Conference 9

RDS Inferences Issues 1. Nonprobability Within network selection probability may be computed (e.g., # recruits/network size), but Unclear coverage of network Measurement error in network size With or without replacement? Seed selection probability unknown 2. Dependence Recruiters and recruits are similar 3. None beyond univariate statistics FCSM Conference 10

ҧ RDS Inferences: Point estimator For binary variables RDS-I: pƹ RDS I B = RDS-II: pƹ RDS II = SS (Gile): pƹ G = S AB ሚd A S ҧ AB ሚd A + S BA ҧ ሚd B σ i S ሚd 1 1 i y ii σ i S ሚd i 1 1 σ i S π ሚd i yi σ i S π ሚd i - S AB : proportion of ties (i.e., connections) that cut across A and B (e.g., the proportion of female peers among all peers recruited by all male participants) - ሚd ҧ A = σ i A ሚd i Τn A - ሚd i is degree reported by respondent i Large degree high selection probability small weight - n A is the sample size of A - y i : Outcome variable - π ሚd i : estimated population distribution of degrees through successive sampling FCSM Conference 11

RDS Inferences: Sampling Variance 1 Naïve estimator Direct estimator by Volz-Heckathorn ( v VH ) - Not usable (requires full network information for all individuals in the population) - Only for proportions - Assumes first-order Markov process Dependency only between immediate recruiter-recruits Dependency static across chains and waves FCSM Conference 12

RDS Inferences: Sampling Variance 2 Bootstrap by Salganik ( v S ) 1. Group non-seeds by characteristics of recruiter (e.g., recruited by male vs. female) 2. Randomly sample a seed 3. Sample a non-seed from the group based on the seed in 2 4. Sample a non-seed from the group based on the non-seed in 3 5. Continue this until the bootstrap sample size equals to n - Only for proportions - Assumes first-order Markov process only on the inference variable FCSM Conference 13

RDS Inferences: Sampling Variance 3 Bootstrap based on recruitment chains 1. Randomly sample a seed and preserve its entire recruitment chain 2. Continue until the bootstrap sample size equals to n - Can be used for all statistics across all variables - Do not assumes first-order Markov process FCSM Conference 14

Application: Health and Life Study of Koreans (HLSK) Funded by the National Science Foundation (GRANT NUMBER SES-1461470) FCSM Conference 15

HLSK Targets foreign-born Korean American adults in Los Angeles County State of Michigan Web-RDS survey http://sites.lsa.umich.edu/korean-healthlife-study/ Unique number required for participation Incentive payment through checks Target n=800 (currently ~600) Benchmarks from American Community Survey FCSM Conference 16

HLSK Formative Research 3 rounds of focus group discussions ~30 participants; 2 rounds in Korean and 1 in English Discussion focused on Web surveys URL, Web site contents, etc. Concept of RDS Coupons Up to 2 coupons Expire in 2 weeks Level of incentives $20 for main, $5 for follow-up, $0 for recruitment FCSM Conference -17-

HLSK Data Collection Started with 12 seeds in LA in June 2016 MI added in November 2016 LA seeds (initially) Recruited through referral Balanced on gender, age, dominant language In-person introduction about the study It became clear the protocols would not work Provide recruitment incentives Add more seeds FCSM Conference -18-

HLSK Data Collection Progress n=336 123 seeds 638 coupons n=270 88 seeds 519 coupons FCSM Conference -19-

HLSK vs. ACS 1 American Community Survey 2011-2015 data HLSK sample estimates Unweighted (UW) RDS-I Weighted: RDS-II Weighted: Post-stratification (PS) by age, sex, educ Weighted: RDS-II + PS FCSM Conference -20-

HLSK vs. ACS 2 FCSM Conference -21-

HLSK vs. ACS 3 FCSM Conference -22-

HLSK vs. ACS 4 FCSM Conference -23-

HLSK vs. ACS 5 HLSK sample estimate CI Unweighted (UW), Naïve RDS-I, Naïve RDS-I, Chain-bootstrap (CB) Weighted: RDS-II, Naïve Weighted: RDS-II, CB FCSM Conference -24-

HLSK vs. ACS 6 ACS FCSM Conference -25-

Summary FCSM Conference 26

What did we learn? 1 Non-cooperation is an issue for generating long chains (memorylessness unlikely) Had to improvise to make RDS work Sample size (hence, chain length) is a random variable affected by many (mostly unknown) factors Inferences unclear and limited FCSM Conference -27-

What did we learn? 2 YET, difficult-to sample groups can be recruited highly-educated young recent immigrants low Korean density areas (e.g., MI UP) FCSM Conference -28-

Where should we go? Non-cooperation is critical for meeting theoretical assumptions (hence, inferences) study design replications of the same study Yet to be addressed in the literature and accounted for in inferences FCSM Conference -29-

Thank you sungheel@umich.edu FCSM Conference 30

References Heckathorn, D.D. 1997. Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations. Society for the Study of Social Problems, 44(2): 174 199. Heckathorn, D.D. 2002. Respondent-Driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden Populations. Social Problems, 49(1): 11 34. Lee, S. 2009. Understanding Respondent Driven Sampling from a Total Survey Error Perspective. Survey Practice, 2(6): 1-6. Lee, S., Suzer-Gurtekin, Z.T., Wagner, J. and Valliant, R. (2017). Total Survey Error and Respondent Driven Sampling: Focus on Nonresponse and Measurement Errors in the Recruitment Process and the Network Size Reports and Implications for Inferences. Journal of Official Statistics, 33(2): 335-366. Sirken, M.G. 1972. Stratified Sample Surveys with Multiplicity. Journal of American Statistical Association 67: 224 227. Sirken, M.G. 1975. Network Surveys of Rare and Sensitive Conditions. Advances in Health Survey Research Methods, NCHSR Research Proceedings 31. Hyattsville, MD: National Center Health Statistics. FCSM Conference -31-