WMI 606: Research Methods Course Tools and data sources session 5

Size: px
Start display at page:

Download "WMI 606: Research Methods Course Tools and data sources session 5"

Transcription

1 WMI 606: Research Methods Course Tools and data sources session 5 Thumbi Mwangi 1 Paul G. Allen School for Global Animal Health, Washington State University 2 Centre for Global Health Research, Kenya Medical Research Institute 3 Wangari Maathai Institute, University of Nairobi February 18th, 2015 WMI 606: Research Methods Course

2 Data sources

3 ...data sources... Primary data - data you collect yourself for specific research purposes Secondary data - information previously collected by others for purposes other than your own research study

4 ...why bother with secondary data... Review existing knowledge - define research problem, formulate research questions & hypothesis, and select research design May be sufficient to answer the research question Data costs only a fraction of the primary data costs - save time and money

5 ...why bother with secondary data... May yield more accurate results compared to primary data - e.g National surveys Help define population to be studied, or make comparisons Provide baseline information, or retrospective data Increase credibility of primary data - supported by other studies

6 ...limitations with secondary data... Access (format data is stored in, permission to use the data) Relevance - finding data relevant to specific research questions Definitions - data dictionary availability? Timescale - when the data was collected? still usable? Reliability - dependent on who collected it, biases and errors Source bias - reports may be biased based on who reported and why

7 Essay Assignment: Use of GIS in Environmental studies Suggested reading: The Green Book Chapter 4.2: Spatial data and geographical information system

8 Designing your study

9 ...principles... Not a single correct way of designing your experiment - but plenty of ways your design may be wrong - leading to invalid conclusions Comparison Replication Randomisation Controlling for variation

10 ...take the example... A new maize variety Mkenya One is equally susceptible to the stem borer damage than the Katumani variety commonly used a) Comparison : have at least two field planted, one with Mkenya One and another with Katumani Treatment: maize variety Experimental unit: the two fields Results - Mkenya One field 20% damaged, Katumani field 50% damaged Theory confirmed?

11 ...take the example... Increase the number of experimental units (fields with the two maize varieties) b) Replication : check for consistency of results Mkenya one: 15%, 32%, 20% and 38% Katumani: 40%, 33%, 45%, 35% Tendancy of Mkenya One to have less damage Results very convincing?

12 ...take the maize example... Increase precision by growing both varieties in the same field Blocking : Left side of each field/plot - plant Mkenya One Mkenya one: 25%, 42%, 30% and 48% Katumani: 50%, 43%, 55%,45% Results very convincing? What if there are consistent differences between left-hand and right hand plots? c) Randomization: to reduce variation between experimental units Toss a coin - to allocate Mkenya One to either left or right-hand side of plot

13 ...take the maize example... Increase precision by growing both varieties in the same field Blocking : Left side of each field/plot - plant Mkenya One Mkenya one: 25%, 42%, 30% and 48% Katumani: 50%, 43%, 55%,45% Results very convincing? What if there are consistent differences between left-hand and right hand plots? c) Randomization: to reduce variation between experimental units Toss a coin - to allocate Mkenya One to either left or right-hand side of plot

14 ...take the maize example... Increase precision by growing both varieties in the same field Blocking : Left side of each field/plot - plant Mkenya One Mkenya one: 25%, 42%, 30% and 48% Katumani: 50%, 43%, 55%,45% Results very convincing? What if there are consistent differences between left-hand and right hand plots? c) Randomization: to reduce variation between experimental units Toss a coin - to allocate Mkenya One to either left or right-hand side of plot

15 ...when considering treatments... 1 Compare and contrast: make sure you are able to make comparisons between treatments 2 Controls: provides baseline values to compare with 3 Multiple factors comparisons: other factors that may influence outcome of interest 4 Quantitative levels: dose-dependent effects

16 ...why use samples... Sample: subset of the study population Sample vs. whole population studies feasibility cost - required resources theory - a representative sample sufficient to provide answers purpose of research to infer or generalize from sample to a population

17 ...samples: what to consider... 1 How many should be selected? (sample size) 2 How should they be selected? (sampling procedure)

18 ...sample size... The larger the sample size, the smaller the sampling error 1 Sample size will depend on: resources available your study objectives practical constraints precision needed - the width of confidence interval There are formulae for sample size calculations - that depend on the study design and the nature of the main outcome variable we want to measure

19 ...sample size... The larger the sample size, the smaller the sampling error 1 Sample size will depend on: resources available your study objectives practical constraints precision needed - the width of confidence interval There are formulae for sample size calculations - that depend on the study design and the nature of the main outcome variable we want to measure

20 ...sampling procedure... After determining the sample sizes, you will need a sampling strategy that does not introduce bias Sampling: selecting of study units from a defined study population sample should be representative - i.e have the important characteristics of the population it is drawn from Two methods of sampling: Probability methods: Non probability methods:

21 ...Probability method... Each study unit from the study population has an equal or known chance of being selected into the study 1 Simple Random sampling 2 Systematic sampling 3 Stratified sampling 4 Cluster sampling 5 Multi-stage sampling

22 ...sampling Simple Random sampling Every subject has equal chance of being selected/included in the study All eligible subjects are given a number, the entire series of numbers being the sampling frame A random sample is picked from the sampling frame The sampling may be with with replacement or without replacement

23 ...sampling Systematic sampling Select subjects from an ordered sampling frame - select every k th subject Use the sampling frame and the desired sample size (sampling frame/sample size desired) e.g 1000/200 households = 5, select every 5 th household the first household is selected randomly Consumes less time and easier to do compared to simple random sampling Risk of bias - if the interval corresponds to a systematic variation

24 ...sampling Stratified sampling Population is divided into groups - stratified according to specific attributes, e.g by age, education, marital status Random sampling within each strata the strata should not overlap (should be mutually exclusive) Should account for the design effect in the sample size calculation Allows for relatively larger samples from a smaller population Risk of unequal sampling fractions

25 ...sampling Cluster sampling Population divided into clusters, random sample selected from each cluster Clusters based on spatial attributes eg. households, counties, sub-counties Each cluster is homogenous Allows for relatively larger samples from a smaller population Risk of unequal sampling fractions, and important to consider this before generalizing findings

26 ...sampling Multi-stage sampling Uses a mix of different sampling methods, at different levels Sampling from sub-clusters or sub-strata within bigger sampling units as clusters E.g sampling from households within sub-locations within agro-ecological zones Allows for large and diverse population to be studied Greater risk of non-representativeness, need to adjust for the design effect

27 ...sampling Multi-stage sampling Uses a mix of different sampling methods, at different levels Sampling from sub-clusters or sub-strata within bigger sampling units as clusters E.g sampling from households within sub-locations within agro-ecological zones Allows for large and diverse population to be studied Greater risk of non-representativeness, need to adjust for the design effect