Methodological Review of Research with Large Businesses. Paper 5: Data Preparation and Analysis Issues for Quantitative Surveys

Size: px

Start display at page:

Download "Methodological Review of Research with Large Businesses. Paper 5: Data Preparation and Analysis Issues for Quantitative Surveys"

Roland Dickerson
5 years ago
Views:

1 Methodological Review of Research with Large Businesses Paper 5: Data Preparation and Analysis Issues for Quantitative Surveys Lead author: Susan Purdon (NatCen) Prepared for HM Revenue & Customs December 2008 Her Majesty s Revenue and Customs Research Report 60

3 Contents ABSTRACT...2 GLOSSARY OF TERMS...3 ACKNOWLEDGEMENTS...4 EXECUTIVE SUMMARY INTRODUCTION DATA EDITING AND CLEANING Recommendations WEIGHTING Weighting for differential selection probabilities Coverage problems Dealing with stratum jumpers Weighting to adjust for perceived non-response bias The control totals to use Weighting to adjust for non-coverage problems IMPUTATION Methods of imputation Recommendations ANALYSIS OF DATA FROM LARGE BUSINESS SURVEYS SUMMARY RECOMMENDATIONS REFERENCES

4 Abstract Data preparation and analysis for business surveys is known to be complex. This paper discusses the main issues that are likely to arise for surveys of, or including, large businesses, covering the editing of data, weighting, both for unequal selection probabilities and for non-response, and imputation. A short discussion of some of the key design issues that affect analysis is also included, in particular the issue of how to analyse small samples drawn from small populations. The paper concludes with some recommendations for HM Revenue & Customs. 2

5 Glossary of terms Control total Editing Finite population correction Imputation Item non-response Non-response weighting Outlier Selection weighting Standard error Stratum jumper Winsorization A population distribution that a survey is weighted against to ensure a match Cleaning of a dataset to remove errors made by respondents or interviewers A standard error adjustment that accounts for sampling fractions. Generally only applied when sampling fractions are high (above 5%) The process of replacing missing data with statistically generated estimates Missing data within an otherwise complete questionnaire caused, for instance, by respondents refusing or being unable to answer some questions Adjustment of the data to account for perceived nonresponse biases A survey response that is considerably smaller or larger than the average Adjustment of the data to account for the use of unequal selection probabilities A standard measure of precision for surveys: the larger the standard error, the lower the precision. In technical terms the standard error is the standard deviation of the differences between the survey value and the true population value over repeated surveys of the same size and design. For most statistics, the 95% confidence interval is calculated as +/-1.96 times the estimated standard error. A business recorded as being in one stratum on the sampling frame but found to be from another stratum during the survey A statistical method for trimming extreme outliers 3

6 Acknowledgements The papers in this series benefited from the input of a large number of people. In particular we would like to thank Kate Fox, Ellen Springall, and Anna Taylor of HM Revenue & Customs, the members of the project steering group, and all those we consulted as part of the review. 4

7 Executive Summary Background In 2007 HM Revenue & Customs commissioned a methodological review of research with large businesses. Their motivation was to examine best practice in research with this population with the aim of identifying strategies for minimising the burden of HM Revenue & Customs research whilst maintaining reasonably high rates of participation among businesses. The review was carried out by a team from the National Centre for Social Research (NatCen) and the National Institute of Economic and Social Research (NIESR). Methodology The exercise incorporated an extensive review of the methodological literature on business surveys. It also involved consultations with a number of people who have relevant experience of businesses and business-focused research, as well as drawing upon experience within the project team at NatCen and NIESR. Recommendations This paper looks at issues around data preparation and analysis for quantitative surveys of large businesses, with a particular focus on data editing, data weighting, and imputation. The main recommendations for HM Revenue & Customs made in the paper are: There is a case for adopting a standardised weighting procedure across surveys of large businesses. Editing of data is an important stage of business surveys, especially so for large business surveys. Sufficient time should be allowed for this. Development of a default approach to the identification and handling of outliers would be useful. The case for and against imputation could be formally assessed so that decisions on when (and when not) to adopt it are simple and consistent. A standard line on which imputation methods to adopt would be a useful development. The complexities of the analysis of business surveys should be acknowledged within HM Revenue & Customs. To generate accurate (i.e. low sampling error and low bias) statistics from business surveys takes time and effort. 5

8 1 Introduction This fifth paper in the series looks at issues around data preparation and analysis for quantitative surveys of large businesses. The paper focuses on the preparation steps needed between gaining a raw dataset and analysis, with the aim of generating a complete and clean dataset that can be treated as representative of the population of interest. These steps can be divided into three main stages: (a) data editing that is cleaning of the data; (b) data weighting to adjust for over-sampling of some businesses relative to others in the starting sample and to adjust for perceived non-response biases; (c) imputation to fill in any major gaps in the answers given by respondents during the survey. The section of the paper that deals with weighting is more detailed than the other sections, on the grounds that weighting is one area where there would be benefits in HM Revenue & Customs applying a uniform approach across their large business surveys. In Section 5 of the paper we turn briefly to some analysis issues, and cover, in particular, an analysis issue that frequently arises in surveys of large businesses: namely the problem of analysing small sample sizes from small populations. 2 Data editing and cleaning The first stage of data preparation for surveys will usually be the validation and editing stage. The aim of this stage is to identify errors in the survey responses given by interviewees during a survey, and then where possible to correct them or, if they cannot be corrected, to recode them as missing values. The Office for National Statistics (ONS) in their report Evaluation and Criteria for Statistical Editing and Imputation (NMNS No 28) divide editing into two types: logical editing, where the data values have to obey pre-defined rules and those found to fail the rules are assumed incorrect (such as values that do not add to a given total) statistical editing, where statistical methods (such as range checks and internal consistency checks) are used to identify values that might be wrong. For paper based postal surveys most validation checks will need to be carried out after the questionnaire has been returned, although it is possible to include some internal logic checks within the questionnaire itself to try and ensure the respondent carries out the checks him/herself wherever possible 1. However for office-identified errors the only option may be to contact the respondent to try and clarify if the possible error is a genuine error. 1 Researchers we spoke to within ONS take the view that including too many internal validation checks on self-completion questionnaires is off-putting for respondents. The implication is that in a voluntary survey they might actually reduce response rates. 6

9 For interviewer administered surveys where the questionnaire is computer-based it is possible to build consistency checks into the programme which should reduce the need to return to respondents or interviewers at the data preparation stage. However some statistical edit checks, such as outlier detection, may need a full dataset because errors are only identified as those responses that are out of range relative to the responses given by others. These cannot be pre-programmed and will need to be checked with respondents or interviewers at a later stage. Arguably the greatest challenge for the efficient editing of business survey data is finding a means of minimising the number of occasions where a survey respondent has to be re-contacted, yet at the same time being able to generate a dataset that gives close to unbiased survey estimates. In principle this means identifying influential errors, which ONS describe as errors in the dataset that would lead to significant errors in analysis if they were ignored. In the context of a business survey this is likely to be values that are unfeasibly small or unfeasibly large given the size and characteristics of the business. If time-series data is available it may also cover values that have changed considerably between survey occasions. Note that outlier detection is an area of statistical research in its own right and we make no attempt to summarise the large amount of associated literature here. 2.1 Recommendations There are probably very few generic rules that can be applied for editing that would apply across all HM Revenue & Customs surveys of large businesses. The simplest recommendation is that, when commissioning surveys where errors are likely to be biasing to survey estimates, HM Revenue & Customs should ask that contractors give a clear account of the edit procedures they will put into place and how, in particular, they will identify, and deal with, influential outliers. The alternative would be for HM Revenue & Customs to develop editing procedures in-house, but we are not aware that this is a model any other government department outside of ONS has adopted. 3 Weighting Most business surveys based on samples (as opposed to a census) will use some degree of variation in the sampling fractions used. The standard method is to increase the sampling fraction as the size of the business increases, so that larger businesses are over-sampled relative to smaller. Whenever varying sampling fractions are used the survey data will have to be weighted before the survey statistics can be considered as valid estimates for the business population. The aim of weighting is to generate a survey dataset that represents the population from which the sample was selected reasonably well. To achieve good representation, weights are often calculated with the aim of adjusting for more than just unequal sampling fractions. The two extra standard adjustments are: (a) weighting to adjust (as far as possible) for bias introduced by non-response; (b) weighting to adjust (as far as possible) for deficiencies in the sampling frame. 7

10 The three elements of weighting are discussed in turn below. 3.1 Weighting for differential selection probabilities To deal with unequal probabilities of selection the standard practice is simply to apply a weight to each responding unit equal to the inverse of the probability of selection. If for example, businesses with employees are selected with a 1 in 100 fraction and those with 500+ employees are selected as 1 in 10, then respondents in the first group would be given a weight of 100 and those in the latter group would be given a weight of 10. Similarly, if businesses are selected with probability proportional to size then the probability of an individual business being selected will be n*size/(total size), where n is the total number of businesses selected. So the weight per responding business will be (total size)/n*size. These calculations are in principle very straightforward, but in business surveys a number of practical problems tend to emerge. These arise from the fact that sampling frames are never entirely accurate on employment size and in a proportion of cases will be completely wrong. This leads to two types of problems: (a) Coverage problems: some businesses thought to be in-scope for a survey on the grounds of their employment size will be found, at interview, to be out of scope. Other businesses will be thought to be out of scope (and so not selected for the survey at all) who in fact are in-scope; (b) Stratum jumping: within the in-scope businesses, some will be found to be of a very different size to that recorded on the sampling frame. If the sampling fractions are very different across strata then this can lead to the situation where, within a respondent-based size band 2, there are businesses with very different selection weights. Taking the example from above again, if businesses thought to have employees are selected with a 1 in 100 fraction and those with 500+ employees are selected as 1 in 10, then if any of the first group are reallocated to the 500+ group at the analysis stage then the 500+ group will have some businesses with a weight of 10 and others with a weight of 100. The latter will then dominate the survey estimates for this size-group. This variation in weights can lead to considerable inflation of standard errors (and hence considerable widening of confidence intervals). There is no consensus on how best to deal with these problems, and we have not identified any specific literature on it. It would seem that most researchers take a pragmatic line based on their best judgement. The stated aim will usually be to minimise bias but not at the expense of excessive inflation of standard errors. The following two sections discuss these issues (coverage problems, and stratum jumping) in turn Coverage problems For surveys with a focus on large businesses a decision will need to be made at some stage on how to deal with instances where businesses selected as large 2 Analysis of business surveys tends to be based on respondent variables rather than on sampling frame variables. But see Section

11 because of their size on the sampling frame are found to be not large during the survey. This tends to happen most often with the smallest size stratum, but it is not unknown for very large businesses on sampling frames to turn out to be much smaller at interview. Clearly there are two possible ways forward: (a) leave the too small businesses in and use their sampling frame size in the survey analysis; or (b) screen out those found to be too small. The problem with (a) is that it can create considerable difficulties at the analysis stage, especially if any of the survey questions are determined by the size of the business. If say, a total monetary value per employee was to be calculated, then would that be calculated per sampling frame employee or per reported employee? So adopting (b) has considerable attractions. However (b) brings its own problems. Essentially, by screening out those deemed too small, but having no means of identifying and bringing in apparently smaller businesses that are in fact large, adopting (b) leads to a bias. In practice, once the selection weights are applied, the smallest size stratum is likely to be much too small, because it now only represents businesses thought to be large and found to be large. In an ideal world surveys of large businesses would also include a sample of those thought to be small so that those found to be large can be included in the analysis. But such an approach is costly and the arguments for the extra investment are unlikely to be very persuasive for most surveys. In the 2005 Workplace Employee Relations Survey (WERS05) the lowest size threshold was 5 employees, and there was no selection from the sampling frame (ONS s Inter-Departmental Business Register (IDBR)) of employers recorded as having between one and four employees. So the survey did not represent those businesses recorded as under five employees on the sampling frame that had, in fact, got five or more employees (perhaps because they had recently grown in size). In this case the solution adopted was to apply weights to the sample to force the survey size distribution to match the published IDBR size distribution. The assumption made was that the aggregate IDBR size distribution was approximately correct even if some businesses on the IDBR were mis-classified. By weighting the WERS distribution to the IDBR distribution, the hope was that bias in the WERS statistics would be reduced by ensuring that the survey did at least match the whole population in terms of size. Any bias due to the fact that WERS did not include any businesses with 5 or more employees who were recorded as smaller on the IDBR was simply accepted. For HM Revenue & Customs studies of large businesses the issue is how to deal with those businesses selected who fall below the size threshold for the particular study. It is not possible to make a definitive recommendation on this, but one option might be to a adopt a rule that: no businesses are screened out on grounds of size unless the statistics from the study are size dependent. For example, on studies where the focus is consultation, screening out may not be necessary. For studies with size-dependent statistics, screen out those outside of the size threshold for the study. 9

12 In both instances a case can be made for adjusting the survey size distribution to the best available estimate of the actual population size distribution see the section on non-response weighting below Dealing with stratum jumpers Stratum jumpers are businesses selected from one stratum but found to belong to another stratum at the time of the survey. The problem that stratum jumpers generate is that they will often have a very different selection weight to others in their final stratum 3. If their weight is much larger than the average then they will have disproportionate influence on the statistics from that stratum; if they have a much smaller weight than the average then they will have very little influence on the statistics (which can be considered akin almost to wasting an interview). None of this is biasing (in the sense that the weighted data will still represent the population), but it does have a negative impact on standard errors, and hence confidence intervals. Again, there seem to be no standard ways of dealing with this in the literature, but pragmatic solutions have evolved. On WERS05 for instance, weights per stratum were truncated so that none were more than three times larger or three times smaller than the median for the stratum. And in the 2007 DWP Survey of Employer Pension Provision large weights were trimmed so that no individual business accounted for more than a fixed percentage of the weighted sum of cases in the analysis stratum. The effect of this capping has not however been evaluated to our knowledge. The expectation is that it will lead to a small increase in bias, but that this should be outweighed by the decrease in standard errors. 3.2 Weighting to adjust for perceived non-response bias The response rate on quantitative surveys of large businesses will never be 100% and is likely to be far below this (see paper 2). To the extent that those businesses that refuse to take part are different to those that do take part, this non-response can lead to bias in the survey estimates. In principle, to remove this bias, the nonresponders would be replaced (by weighting the data) by responding businesses that are very similar on all the survey variables of interest. But of course, if enough information were known about businesses to allow for valid replacement then the survey would not be needed in the first place. Plus, for very large businesses, it is probably the case that there simply are no valid replacements. In practice, all that is possible is that the responding businesses in a survey are weighted so that the survey distribution looks as close to the population of interest as possible. One simple way of doing this is to ensure that the survey, after applying selection weights, is additionally weighted to ensure it matches the population of large businesses in terms of size and Standard Industrial Classification (SIC) group. But for HM Revenue & Customs sponsored surveys, where the sample is drawn from their large business database 4, it may also be possible to weight the data to ensure a good match between the survey and the population on other variables such as turnover and amount of Corporation Tax (CT) paid. 3 Analysis will usually be based on respondent reported size rather than on sampling frame size. 4 See Paper 1 for a discussion of sampling frames. 10

13 In deciding on the optimal non-response weighting strategy for a particular survey it is important to try to balance the gains in bias reduction against losses in precision because non-response weighting adds variance to the weights, which in turns leads to increases in standard errors. On some surveys it might be judged preferable not to incorporate non-response weighting at all, on the groups that the loss of precision far outweighs any benefits. But, more commonly, non-response weighting to ensure representative on at least a few key variables is likely to be judged appropriate. Given that surveys tend to be designed to generate a wide range of statistics, survey statisticians usually adopt non-response weighting strategies based on their judgement of what appears reasonable rather than on strict empirical assessment the argument being that a strategy that is optimal for one statistic will be sub-optimal for another, so there is no overall optimal approach. Assuming the case for applying non-response weights has been made, the two main decisions needed for calculation of the weights are: (a) which variables to adjust for; (b) how to do the adjustment. On the first, a reasonable default for HM Revenue & Customs surveys of large businesses might be that employment size and SIC are always adjusted for but that the case for additional adjustment on other variables be assessed on a survey by survey basis. The factors to take into account when making the decision would be survey sample size (smaller samples meriting less adjustment than larger surveys because the risk of excessive variation in weights is higher with small samples) and the subject matter of the survey (for instance a survey with a focus on tax planning might benefit from non-response adjustment by amount of CT paid, whereas for a survey about relationships with HM Revenue & Customs such an adjustment might be excessive). The question of exactly how to calculate the non-response weights is potentially rather tricky to resolve. There are a number of standard methods available, such as cell weighting, rim weighting, CHAID 5, and binary regression modelling (logistic or probit), all of which have advantages and disadvantages. To ensure consistency across HM Revenue & Customs surveys of large businesses, adopting one method as the default is recommended, but this needs to follow a formal appraisal of the alternatives. Note that WERS05 used cell weighting by size and SIC The control totals to use If HM Revenue & Customs were to adopt a first stage of non-response weighting by size and SIC as the default then this raises the question of which population distribution by size and SIC to weight to 6. The two main possibilities are the distribution derived from the HM Revenue & Customs database of large businesses and the IDBR distribution. Other possibilities include the distributions derived from other sampling frames, such as the Dun & Bradstreet database, although we can see no obvious advantage to this. It is usual to weight to the population based on the sampling frame used for the study, but this isn t strictly necessary. And the key advantage of always adjusting by the same size and SIC distribution, irrespective of 5 Chi-Squared Automatic Interaction Detection 6 Note, additional non-response weighting on other variables should not be ruled out, but the need for this would be judged on a survey by survey basis. 11

14 sampling frame, is that this ensures consistency across HM Revenue & Customs surveys on at least these two key variables. If the decision is between control totals derived from the IDBR and control totals derived from the HM Revenue & Customs database then a natural first step in deciding between them would be a comparison of the distributions from the two. If they are very similar then the convenience of using the HM Revenue & Customs distribution would make that the natural choice, especially if, as the sampling paper in this series recommends, the HM Revenue & Customs database becomes the default sampling frame. 3.3 Weighting to adjust for non-coverage problems In the introduction to this section we implied that weighting for non-response and weighting for deficiencies in sampling frame were separate exercises. In practice, if weighting to fixed control totals is adopted, then this handles both non-response and frame adjustments simultaneously. That is, as well as dealing with differential response by size and SIC, the non-response weighting to fixed population totals should also adjust (at far as is possible) with non-coverage of the sampling frame in particular size and SIC groups. Ideally the two processes can, and should, be accounted for separately, but the impact on survey estimates will be the same irrespective. 4 Imputation Whereas weighting is traditionally used to adjust for unit non-response (that is refusals to taking part in the survey itself) imputation is generally used as a means of dealing with item non-response (that is non-response to particular questions within the survey). Essentially imputation replaces missing items with valid survey responses. Imputation of missing data is fairly widespread in mandatory statistical returns from businesses but seems to be far less widespread on ad-hoc business surveys. The reasons for this are unclear but are likely to include the following: mandatory statistical returns tend to be repeated over time, so for any missing item, or indeed any missing return, there will often be an historical time-series from the same business. This can make reasonably accurate prediction of missing values relatively straightforward; mandatory statistical returns are usually paper-based so it is relatively easy for businesses to opt not to complete particular items or to give values that subsequently fail edit checks; ad-hoc surveys are generally interviewer administered (either face-to-face or by telephone) so there is more implicit pressure on respondents to answer all questions, and for the answers given to pass validation checks. So the level of item non-response should be lower on interviewer-administered surveys; item non-responses are sometimes refusals but are often don t knows. In the latter instance respondents can often be encouraged by an interviewer to give their best estimate. These estimates are likely to be more accurate than any imputation routine could generate, so are to be preferred to imputation; 12

15 the style of analysis from ad-hoc surveys is generally very different to that from statistical returns. Statistical returns tend to be used to generate point estimates one variable at a time, whereas survey analysis tends to look at variables in combination. Imputation that improves point estimates can bias the analysis of relationships so imputation is treated with more suspicion for ad-hoc surveys. The one area where imputation is fairly likely to be used even in ad-hoc surveys is for numerical values that contribute to a total. As an illustration, suppose in a household survey that household income was to be collected divided into income from employment, income from savings, income from benefits and income from other sources, with the intention of using these four values to generate total household income. Then if a householder suggested that they did have income from savings but didn t know the amount then this would be recorded as missing in the data. Without imputation of a value, savings would contribute zero to this households total income which clearly gives too small a figure. So, in this instance, there would be strong case for imputing missing components of income so as to avoid generating biased totals. A similar argument would apply to business surveys where components of revenue are being collected as a means of generating a total. 4.1 Methods of imputation There are a wide range of imputation methods used for replacing missing values in datasets. The ONS taskforce paper on imputation (NMNS No 3) gives a short and straightforward review of the main ones and their implications for analysis (especially for distributions and relationships) 7. The paper also usefully provides a decision tree that helps to establish which method of imputation will be most appropriate given the type and level of missing data and the extent to which the dataset includes good predictors of missing values. 4.2 Recommendations The case for imputation needs to be assessed on a survey by survey basis, and probably, within surveys, on a variable by variable, basis. However there are a few general rules that might be applied: For surveys used primarily to generate discrete (point) estimates, it will usually be more important to impute for missing numerical values that contribute to derived variables (in particular totals) than to impute missing categorical variables or missing stand-alone numerical variables. If survey data is to be analysed using multivariate statistical methods (that is, methods that involve the simultaneous analysis of more than two variables) then imputation may be needed more generally. An assessment should be made of the likely biasing effect of missing values. For a business survey that aims to collect financial data this may mean that imputation is only done to replace missing values for the very largest businesses, since the values from these businesses will have most leverage on the overall 7 The methods covered include: deductive methods; look-up tables; imputing the overall mean or mode; imputing a randomly selected case; imputing a class mean; imputing at random from a imputation class; hot-deck imputation; predictive regression imputation; random regression imputation; ratio methods and time series models. 13

16 survey estimates. Alternatively, cruder methods of imputation (such as replacing missing values with the mean) might be used for smaller businesses and more sophisticated methods for larger businesses. If the rate of missing values is low then it may be reasonable to use very simple imputation methods, on the grounds that crude imputation on a small scale will not bias the relationships in the data by more than a minimal amount. If the rate of missing values is high then far more care is needed with the imputation. 5 Analysis of data from large business surveys The analysis of data from large business surveys will generally be technically more complex than that of general population surveys. The reasons include: for surveys that collect financial data from businesses the values collected will often be very skewed. This raises challenges for estimation of means and totals, especially if there is evidence that a small number of businesses are very influential; the data will be weighted, with the weights potentially being very variable if the probability of selection for businesses increases with the size of the business. As well as examining the impact of extreme data values, the impact of extreme weights also needs to be assessed 8. Furthermore interactions between the two can be particularly important: that is very large weights associated with very large values; calculating standard errors for surveys that are weighted and/or include imputed values is not straightforward and needs specialist statistical software or knowledge; potentially small sample sizes within key sub-groups (such as very large businesses operating within a particular sector) can make drawing inference about these groups very difficult. There are no easy solutions to any of these problems. It must simply be accepted that the analysis of business survey data will take more time, and will involve more ongoing checking (including sensitivity checking 9 ), than more straightforward surveys. The implication of this is that for complex business surveys, especially any that collect numerical data where the potential for instability because of outliers is high, more time will need to be assigned to the reporting stage than may be the case for more standard surveys. This is true for all business surveys but is likely to be particularly so for large business surveys where the prevalence and impact of large outlying survey values is likely to be rather greater than average, and where higher than average variance on almost all measures relating to size will make survey estimates particularly imprecise. In addition it may prove necessary to ensure that the analysis team includes a survey statistician with experience of the analysis of complex business surveys. Each of the four issues listed above is discussed briefly in turn below, the aim being to draw out some of the most salient points. 8 One way to do this is to check the impact on statistics of trimming very large or very small weights 9 Sensitivity checks are checks of alternative methods of analysis to see how much impact the method has on the survey statistics. If each method gives very similar statistics then the choice between methods is fairly arbitrary. If the statistics are very different then a decision between methods has to be made based on defensible criteria. 14

17 (a) skew in the underlying distribution and the problem of large outliers This issue has already been partially addressed in Section 2 on editing. In principle the editing process should identify any extreme values. If they are accepted as correct then flagging them in the dataset would help analysts to identify them and to make decisions on how to handle them in the analysis. In some instances there may be a case for trimming extreme values (one well-established method being a technique called Winsorization (see for example Lee, 1995). Statistical advice should however be sought on this before going ahead. (b) the impact of the survey weights Extremely large weights in the dataset need to be identified and possibly trimmed, again on the grounds that otherwise the cases with large weights are overly influential in the final estimates and can make these estimates unstable. Large weights can be particularly problematic if associated with outlying values in the data, so combinations of the two should be identified as early as possible in the analysis. (c) appropriate statistical software Dealing with complex survey datasets requires the use of appropriate statistical software. Ideally software is needed that will be able to deal with stratified samples drawn (in some strata) from small finite populations and with a high sampling fraction, and with both selection and non-response weights added. Packages such as STATA and SAS can deal with most of the complex design features for business surveys, although it is sometimes necessary to assume an approximate fit between the actual design and the design features the software can handle. There is some disagreement amongst statisticians about whether weights need to be used in the modelling of survey data. The main arguments are presented in Purdon and Pickering (2001) (d) dealing with small sample sizes One of the key problems that business surveys often present is the issue of how to deal with small sample sizes. The business population is extremely diverse (arguably far more so than the household population) and there will be a wish to divide business surveys into small sub-groups in order to gain a good understanding of the range and diversity of behaviour. The standard sub-group split is inevitably employment size within SIC group, but other divisions may also be needed. Inevitably there are no magic bullets here: if a sample size is too small then no amount of statistical wand-waving will generate precise survey estimates. However there are a few possibilities to bear in mind: Firstly, and most obviously, if the sub-group analysis is planned for at the survey design stage then, budget permitting, the sample can be allocated across sub-groups in a way that will maximise the opportunity for sub-group analysis. Secondly, it is important not to overlook finite population corrections. If within a sub-group the sample is small but has been drawn from a small population (so that the sampling fraction is large) then survey estimates may still be fairly 15

18 precise. This will be clear if standard errors are adjusted by the finite population factor. A note of caution is needed here though. If the survey is aiming to make statements about businesses in general and not at just one point in time then treating any sub-group population as finite may be inappropriate 10. Thirdly, and far more controversially, it may prove possible to statistically model estimates for sub-groups with small sample sizes. This is akin to the synthetic estimation that is sometimes used to generate local area estimates within the UK. Essentially a predicted estimate for a sub-group would be generated using a statistical model and then this predicted score combined with the survey sub-group estimate to generate the synthetic estimate for the group. As an example, suppose some aspect of tax behaviour was found to be predicted reasonably well by size, SIC and amount of tax paid 11. Then based on the prediction model the behaviour within the sub-group of interest would be predicted. Combining this with the direct, but small sample, estimate of behaviour from the survey would give the final synthetic estimate. Synthetic estimates are controversial because they cannot be shown to be accurate they may however be useful planning tools. (See for example Rao and Choudhry (1995) or Pickering et al 2005.) 6 Summary The three main stages of data preparation needed for quantitative surveys are editing, weighting, and non-response adjustment. For business surveys none of these stages are straightforward and, arguably, data preparation for business surveys is far more complex than it is for other types of survey. The main issues are: the detection and handling of errors the detection and handling of outliers weighting for unequal selection probabilities in the context of potential sampling frame coverage problems and stratum jumping weighting to reduce non-response bias dealing with missing data within questionnaires. Even once a clean, and representative, dataset is derived, the analysis of data from business surveys is still complex, one of the main challenges being how to deal with potentially very skewed data. The problems are often enhanced by the fact that the practice of over-sampling large businesses relative to small generates datasets with very variable weights. Survey statistics can prove very sensitive to these characteristics. There are no simple solutions that can be applied. The main implication is that more time has to be allowed for data preparation and analysis of business surveys than is typically allowed for simpler surveys. 10 As an illustration, if a survey was attempting to discover how businesses in 2008 had reacted to a policy change in 2007 then a 2008 survey could be treated as drawn from a finite population: namely the population of businesses in Another way to think of it is that the view of the business on this matter is fixed. In this case finite population corrections would apply. But if a survey was attempting to understand the relationship between, say, profit and tax behaviour, then a survey in 2008 would just give a snapshot in time of a dynamic relationship. In this case, the response from each business should be treated as a one-unit sample of all the responses that the business would give if repeatedly sampled. In this case finite population corrections would not apply. 11 Based on a model where size and SIC are entered as main effects only 16

19 7 Recommendations There is a good case for adopting a standardised weighting procedure across surveys of large businesses. Doing so is our key recommendation Editing of data is an important stage of business surveys, especially so for large business surveys. Sufficient time should be allowed for this, and being able to provide coherent plans for editing should be a criteria for judging between survey contractors. Development of a default approach to the identification and handling of outliers would be useful, although we recognize this may prove difficult in practice given that most HM Revenue & Customs surveys are ad-hoc. Imputation as a means of dealing with missing values may never prove to be wide-spread practice for HM Revenue & Customs surveys. But the case for and against imputation could be formally assessed so that decisions on when (and when not) to adopt it are simple and consistent. Similarly a standard line on which imputation methods to adopt would be a useful development. The complexities of the analysis of business surveys should be acknowledged within HM Revenue & Customs. To generate accurate (i.e. low sampling error and low bias) statistics from business surveys takes time and effort, especially for complex financial statistics that are prone to extreme outlier problems. The problems of trying to draw inference from small samples are to a degree insurmountable. However there are strategies that might be explored. But they are complex and controversial, so promises to provide robust small sample estimates should not be accepted blindly. 17

20 8 References Cox, H (1995) Outliers in Business Surveys, in Business Survey Methods ed Cox et al. Hiridiglou, Sarndal and Binder (1995) Weighting and Estimation in Business Surveys, in Business Survey Methods ed Cox et al. NSMS28: Evaluation and criteria for statistical editing and imputation, GSS Methodology Series NSMS03: Report of the task force on imputation, GSS Methodology Series Pickering K, Scholes S and Bajekal M (2005) Synthetic estimation of healthy lifestyle indicators: Stage 3 report. ( Purdon, S. & Pickering, K. (2001) The Use of Sampling Weights in the Analysis of the 1998 Workplace Employee Relations Survey. London: National Institute of Economic and Social Research. Rao and Choudhry (1995) Small Area Estimation: Overview and Empirical study in Business Survey Methods ed Cox et al. 18