Session 2A: The limitations of Simple Random Sampling and The practice of sampling for household surveys Juan Muñoz, Sistemas Integrales Delhi, March 18, 2013 Please join Channel 41
Limitations of Simple Random Sampling SRS may be an option in certain cases, but it may not be practical if if we need estimations for subgroups of the population especially if some of the subgroups are small we don t dispose of an adequate sample frame a Simple Random Sample would be too scattered in the territory We then resort to other techniques Stratification Sampling in stages 2
Elections in an archipelago 3
Stratification We divide the population into subgroups, called strata We take a separate sample in each stratum Stratification may be needed if: We want to reduce the standard error, by gaining control of the composition of the sample We want to assure the representativity of certain groups The selection probabilities may differ across strata These two objectives are contradictory in practice This imposes the use of weights, to assure the external validity of our impact evaluation 4
Electoral survey in two stages 5
Two stage sampling Instead of taking a SRS We divide the territory into small areas, called Primary Sampling Units (PSUs). In the first stage, we choose PSUs. In the second stage, we select households in the chosen PSUs 6
Two stage sampling Solves the problems of SRS Reduces transportation costs Reduces sample frame problems The sample can be made self weighted if We choose PSUs with Probability Proportional to Size (PPS), and then We take a fixed number of households in each PSU The price to pay is cluster effect 7
Cluster effect Standard error grows if, instead of taking a Simple Random Sample of n households, we take a two stage sample, with k PSUs and m households per PSU (n=k m) Intra Cluster Correlation e 2 TSS e 2 SRS 1 m 1 Cluster effect Two-Stage Sample Simple Random Sample
Cluster Effect For a total sample size of 12,000 households Number of PSUs HHs per PSU Intra-Cluster Correlation 0.01 0.02 0.05 0.10 0.20 3,000 4 2,000 6 1,500 8 1,000 12 800 15 600 20 400 30 300 40 200 60 150 80 100 120 1.03 1.06 1.15 1.30 1.60 1.05 1.10 1.25 1.50 2.00 1.07 1.14 1.35 1.70 2.40 1.11 1.22 1.55 2.10 3.20 1.14 1.28 1.70 2.40 3.80 1.19 1.38 1.95 2.90 4.80 1.29 1.58 2.45 3.90 6.80 1.39 1.78 2.95 4.90 8.80 1.59 2.18 3.95 6.90 12.80 1.79 2.58 4.95 8.90 16.80 2.19 3.38 6.95 12.90 24.80
Design effect In a two stage sample Cluster effect = e 2 TSS / e 2 SRS In a more complex design (with two or more stages, stratification, etc.) Design effect = Deff = e 2 Complex design / e 2 SRS Can be interpreted as an apparent contraction of the sample size, as a result of clustering and stratification Can be estimated with special software (e.g., Stata s svy commands) 10
Household samples Choosing the PSUs PSUs are Census Enumeration Areas (EAs), or groups of EAs PSUs typically have 50 to 200 households The sample frame is a small file. Can easily be managed with Excel PSUs in the sample are generally selected with Probability Proportional to Size (PPS) The selected PSUs must be recognizable in the field Implies collaboration with the National Census Office Outsized PSUs may require some work See how to do it in the UN Manual for HH Surveys in development and transition countries Computer files are not enough, We also need maps 11
Household samples Choosing the households The best sample frame is the full list of all households in the selected PSUs The household listing operation requires time and money. Relative to the project s overall calendar and budget, these are Marginal, if they are accounted for beforehand Large enough to be a big headache, if they are not Information to be reported on the listing Name and address, as a minimum Additional data required for the selection (e.g., presence of pregnant women, or children) Households are generally selected from the listing by systematic equal probability sampling Beware of imitations, such as random walks snowballing expert opinion Do not ask additional information that is not essential 12
The best way of dealing with non response is to 1. Replace non respondents by similar households 2. Increase the sample size to compensate for non response 3. Use correction formulas 4. Use imputation techniques to simulate the answers of non respondents 5. None of the above
The best way of dealing with non response is to prevent it The big problem with non response is not the reduction of sample size. The problem is bias. Lohr, Sharon L. Sampling: Design & Analysis (1999) 14
Motivation Training Work load Qualification Work plan Interviewer Non response Availability Socio economic Questio nnaire Respondent Fatigue Biologic testing Motivation Demographic Proxy Fuente: Some factors affecting Non Response. by R. Platek. 1977. Survey Methodology. 3. 191 214 15