The impact of using web in the Danish LFS

Size: px
Start display at page:

Download "The impact of using web in the Danish LFS"

Transcription

1 The impact of using web in the Danish LFS Background breaks in time series The Danish core part of the Labour Force Survey (LFS) consists of approximately interviews each quarter. From January 2016 introduced into the data collection. We went from collecting data only by telephone to also collect it by web. The data collection has been outsourced since 2007, and has to open for tendering every five years. As a result of this, a new company took over the data collection in 2016 at the same time as introducing. However, this company could not maintain a sufficiently high response rate, and therefore we terminated the contract from January This means that in 2016 the Danish LFS underwent three changed: New data supplier, mixed mode and a low response rate resulting in a general break in time series. Hence, from 2017 the company that collected data up until 2016 resumed the data collection. This company has managed to reach and maintain our target response rate of 60. In order to do that, this new supplier collects a larger share of CATI than the former supplier and slightly lower share of. Moreover in 2017 we started sending out invitation letters digitally instead by ordinary post. Hence, in 2017 the Danish LFS once again underwent substantial changes: New supplier, digital invitation letters, change in share between and CATI and a higher response rate causing another break in time series. Analyzing these breaks is complex because of the different changes. In this paper I will look exclusively on the effect of on the employment for employees with the purpose of analyzing whether or not there is a bias in the survey mode. As our new data supplier will collect data in the next years, and as there are substantial differences between the data collection in 2016 and 2017, my analysis of the impact of using web will focus exclusively on 2017 data. Table 1 shows the distribution of and CATI for the entire 2017, showing that the largest part of the questionnaires is answered by phone. These are the unweighted figures I will use in my analysis. Table 1. Survey mode, age 15-74, Frequency Percent ,3 CATI ,7 Total ,0 and CATI responses in the Danish LFS sends out invitation letters to the LFS to the respondents digital mailbox. Almost all citizens are obligated to have such a mailbox in order to receive information from public authorities electronically instead of by ordinary post 1. This address is not to be changed, and that enables us to send out invitation letters and reminder letters with direct links to the survey to this mailbox. At the moment our data supplier leaves the two first days of the two weeks long interview period only for web responses. 61 of the web-responses are completed these two days. On the third day 2 the interviewers start calling respondents who have not yet answered by web. Table 2 shows the distribution of - and CATI-interviews along the interview period, and illustrates that CATI-responses are completed throughout the interview period, while very few -responses are made in the second week. 1 Around 7 are granted exemption for example because of disability or high age. 2 With a few exceptions on CATI-interviews completed the first two days.

2 Table 2: Distribution of and CATI interviews along the 14 days long interview period, Day Total CATI Total ,39 15,77 9,13 6,63 3,59 2,30 3,41 3,62 2,99 2,15 1,61 1,06 0,85 0,51 100,00 CATI 0,05 0,05 16,70 13,31 6,86 3,31 7,79 12,80 9,83 5,93 5,05 4,26 5,58 8,49 100,00 Persons Percent The electronic invitation letter with a direct link and the many web-responses the initial days of the interview period indicate that the -mode is far more voluntary than the CATI-mode. This aspect of voluntariness along with the fact that the two modes might appeal to different people, make it likely that the two groups of - and CATI-respondents are quite different from each other. In the first part of the analysis I will look at how the groups differ according to some background characteristics mainly taken from registers. Using register variables ensures a one-way causality between the specific characteristic and the survey mode. -respondents are different from CATI-respondents Table 3 shows the division of men and women in each survey mode revealing that in 10 age points more of the answers are made by women than by men, whereas in CATI 4 age points more answers are made by men than by women. Table 3. Sex by survey mode, age Men Women Total 45,2 54,9 100,0 CATI 51,8 48,2 100,0 In table 4 the - and CATI-respondents are each divided into four age groups, showing that CATIrespondents are younger than -respondents. A little more than a third of the CATI-respondents are less than thirty years old, whereas this goes for a fifth of -respondents. The other way around more than every forth -respondent has passed 60 years of age, while this goes for around every sixth CATIrespondent. The average age of respondents is 46, and of CATI respondents it is 40. Table 4. Age by survey mode Age group Total 20,7 21,8 31,4 26,1 100,0 CATI 34,9 22,0 25,7 17,4 100,0

3 In the Danish LFS we primarily get highest level of from the Education register. Table 5 shows significant differences in level of education between - and CATI-respondents. A third of CATI-respondents has primary or lower secondary education, while this is the case for a fifth of -respondents. The other way around 10 age points more of -respondents than CATI-respondents have tertiary education. Table 5. Education by survey mode, age Primary and lower secondary Upper secondary Tertiary Total 19,5 42,1 38,4 100,0 CATI 33,7 38,2 28,1 100,0 -respondents having a higher level of education could very well be correlated with their higher age, in the sense that few young people have yet taken a tertiary education. Both education and age is most likely again correlated with the fact that -respondents to a larger extent than CATI-respondents are registered in s register based statistics on employees (RSE) (table 6). This register includes all employees in Danish companies reported by employers to the Central Danish Tax Administration. Table 6 Registered in RSE by survey mode, age ,7 32,3 100,0 CATI 62,0 38,0 100,0 To sum up, the above characteristics are most likely inter-correlated. However, I will not dig further into that. The point of this part of the analysis is to show that two different groups answer in the two different modes. -respondents have a significantly higher average age and level of education than CATIrespondents, and they also have a bigger share of women. 3 Bias in survey mode I will use the knowledge from the previous section about the correlations between different background characteristics and survey mode to investigate if the survey mode has an impact on employment status of employees. I will do that by looking at whether the overlap between employment in the reference week in RSE and LFS is significantly different in compared to CATI. As RSE includes all employees in Denmark we consider this to be an objective and correct benchmark of employee-employment. As a difference in overlap very well could be due to the fact that -respondents are different from CATI-respondents, I will try to eliminate such a potential spurious correlation between survey mode and employment status by making the respondents in the two survey modes more alike. I would expect age to play a big role for the respondent to know whether he/she actually worked in a specific week, mostly because young people often have a loose attachment to the labour market, for example by doing small jobs which they can have trouble estimating whether or not are to be considered as work according to the ILO-definition, and they 3 -employees moreover have a higher income than CATI-employees, and more -respondents are employed in public administration, education and health, and a larger share of them work as professionals and technicians (table 7,8,9, appendix 1).

4 might have irregular working hours/periods without work. Moreover level of education could influence how well the question about work and the concept of work are perceived. Methodology Respondents who are registered as employees in RSE in the reference week would be expected to also answer in the LFS that they worked as employees. RSE-employees are registered by having been paid a salary from an employer, and the respondents unawareness of this fact seems unlikely 4. If limiting my analytical population to RSE-employees, then mismatch in overlap with LFS-employment would probably mostly be due to the respondent mistaking the days/week he/she actually worked. Instead, I will limit the analytical population to be employees according to the LFS, and then check the overlap with RSE. This increases the possibilities to look at discrepancies in employment between RSE and the two modes in LFS. Examples as having done work-related activities, such as a bit of housekeeping, getting a little extra pocket money for shopping for ones parents can be reasons for mismatch in overlap when limiting the analytical population to be LFS-employees - reasons which would not become visible if limiting the analytical population to be RSE-employees. And such errors in classification of employment can be more at stake in one survey mode than the other 5. Employee-overlap with RSE in and CATI Table 10 shows that -employees are to a larger extent than CATI-employees also registered as employees in RSE. To make sure that this correlation is not just spurious, I will make the two quite different groups of respondents a lot more similar by looking at subgroups. Table 10 LFS-employees status in RSE, and CATI. Age 15-74, ,6 2,4 100,0 CATI 94,6 5,4 100,0 Chi test p-value <0,0001 Expecting age and education to play significant roles for capability to perceive and answer the questions about work in the reference week correct, not the least because stability in job situation most likely vary with age and education, I will look at different subgroups according to these characteristics. Table 11 shows the RSE-employment status in and CATI of young LFS-employees with less than tertiary education. The overlap with RSE is smaller than for the entire group of LFS-employees, but 4 Though the respondent could have being fired and still receive a salary. Other similar examples might be reasons for mismatch, but I consider them rare. 5 Limiting the analytical population to be LFS-employees, makes undeclared work to also be a potential reason for mismatch in overlap between LFS and RSE, and mismatch due to this is supposed to be there, as undeclared work is to be considered work according to the ILO-definition. Undeclared work could be unevenly distributed between and CATI, but as it is not a widespread phenomenon in Denmark, I will not look further into this. Another potential reason for LFS respondents having defined themselves as employees and not appearing in RSE could be that they are officially self-employed. However, such cases must be very rare, as people are expected to know whether or not they own a company. The respondent s status in RSE is matched by looking at RSE-status the Wednesday in the reference week. For example when a respondent starts a job it might first be registered later when the salary is paid. This problem with periodicity is more at stake for respondents with unstable job situations, and this might more frequently be the case for respondents answering in a specific survey mode. Along with uneven distribution of undeclared work this is not possible for me to check for in this analysis.

5 the difference in overlap between and CATI is still around 3 point better for employees. This difference is also maintained when looking at young respondents as a whole (table 12, appendix 1). Table 11 LFS-employees status in RSE, and CATI, age 15-29, non-tertiary education. 93,7 6,3 100,0 CATI 91,0 9,0 100,0 Chi test p-value <0,0001 Looking at LFS-employees in the age group 34-59, which is very stable on the labour market, this group also tend to have a bigger overlap with RSE when answering in, regardless of controlling for level of education, though a little part of this age group in general is not in RSE (table 13). Though the chi-test indicates a significant difference between and CATI in overlap with RSE it is smaller than for the young employees. The picture is the same in the oldest age group of year olds with a difference around 2 points (table 14, appendix 1). These results indicate that gives more precise employment figures for employees than CATI in all age and education groups, but especially in the group of young people. Table 13 LFS-employees status in RSE, and CATI, age 34-59, tertiary education. 99,0 1,1 100,0 CATI 97,7 2,3 100,0 Chi test p-value <0,0001 The time aspect Distance from the end of the reference week to completion of the questionnaire most likely play a role in how well the respondents remember their employment situation in the specific week. This assumption is confirmed in table 15, which shows overlap with RSE for LFS-employees having answered the questionnaire 1-6 days after the end of the reference week and 7-14 days after, even when controlled for age and education. Table 15 LFS-employees status in RSE, by distance to reference week; age 15-29, non-tertiary education. 1-6 days 93,4 6,6 100, days 91,2 8,8 100,0 Chi test p-value =0,0362

6 As survey mode is also correlated with the interviews distance to the reference week (cf. table 2), this factor is essential to control for. Doing this, does however take away at least 60 of the responses, as such a big part of these are completed the first two days where no CATI-interviews are conducted. Limiting the group of respondents to different time intervals of distance to reference week, for example 3-14 days (table 17, appendix 1) or 7-10 days (table 16) does not change the result that seems to measure employment more precise than CATI (table 17, appendix 1). In the respondents are presented to a calendar underneath the question about work in the reference week. In the calendar the reference week is highlighted as well as non-working days are. Moreover the mode is far more than the CATI-mode an invitation to check one s own calendar to see if you actually did work or not. These aspects make less vulnerable to increased distance between completion of questionnaire and reference week, and this is probably why the difference in overlap with RSE between and CATI is not decreased though controlling for the time aspect. Table 16 LFS-employees status in RSE, and CATI; age 15-29, non-tertiary education, questionnaire answered day ,1 5,9 100,0 CATI 90,7 9,3 100,0 Chi test p-value = 0,0128 Employed students a looser attachment to the labour market In this final part of the analysis I will go a little more into depth with employed students, because they have a looser or more temporary attachment to the labour market. Though in general seems to be a more precise way to measure employment status than CATI, the difference in overlap with RSE was biggest in the young age group which has a looser attachment to the labour market than the other age groups. It must be even more defining for the group of students, than young people in general, to not work that many hours, and as many of the students probably identify themselves as such (and not as employees), they might to a larger extent doubt if the specific work they did in the reference week qualifies as employment, or they might forget whether or not they worked for pay in the reference week, especially as this group in particular is expected to have periods of absence due to exams etc. Table 18 shows the employment status in RSE of LFS-employees, who are registered as receiving a student grant, divided into and CATI. A significantly larger part of -employees who study are also RSE-employees. This group of students has a bigger difference in overlap with RSE than the group of young employees, and this is the group where CATI-employees have the lowest correspondence with RSE. Table 18 RSE-Employment status in and CATI, LFS-employees who receive student grant. RSE-employee Non-RSE-employee Total 91,8 8,2 100,0 CATI 87,7 12,3 100,0 Chi test p-value < 0,0003 Apart from problems remembering whether they actually worked for pay or not, students who think that studying in the reference week equals working for pay might be an error that happens more often in CATI, because if the respondent gives a yes to having worked for pay in the reference week, the interviewer will

7 most likely just proceed with the next question, whereas in a warning for young people specifically appears informing that studying is not to be considered employment. Such a misclassification seems to be a greater risk in when looking at LFS-respondents who receive a student grant and are not registered in RSE. Those respondents are expected to be without employment in the LFS or very few could be self-employed, family workers or doing undeclared work. I have eliminated the very few self-employed and family workers and looked at the share of employees and respondents without employment in and CATI. Table 19 shows a significantly smaller share of CATI-students being without employment than -students. The difference is even bigger when eliminating the few with tertiary education and those older than 29 (table 18, appendix 1). Table 19 Employee status in and CATI, non-rse-employees with student grant. Employee Without employment Total 7,6 92,4 100,0 CATI 12,7 87,3 100,0 Chi test p-value < 0,001 Conclusion Introducing mixed mode data collection in the Danish LFS has created two groups of respondents that vary quite a bit. The -set-up inspires to far more voluntariness in the participation to the survey. The largest part of these respondents answer within the first two days; they are older, more educated and to a larger extent women compared to CATI-respondents. Even when controlling for these differences the results from the different subgroups that I have looked at all point in the same direction, and that is that gives more accuracy in measuring employee-status than CATI. In other words, the survey mode seems to have an impact on the correctness of respondents answer to the questions about work in the reference week. This conclusion is made because -employees were significantly better represented in RSE than CATI-employees, and as RSE includes information about all employees in Denmark based on employers payment of a salary, we consider this register to be an objective measure of employment. Even in the very stable employee group of year olds with tertiary education the overlap between LFS and RSE was a little, though significantly better in than in CATI. The fact that the respondent is presented to a calendar highlighting the reference week and to official non-working days, and the fact that the respondent is more free to check with his own calendar than CATI-respondents, must make the difference for this stable group of employees. Because it did not seem to be the distance to the reference week for the biggest part of the interviews that caused the better correspondence with RSE. The more days passing from the end of the reference week to completion of the questionnaire, the harder it gets for the CATI-respondent to remember his exact working situation, whereas the memory of the -respondent is being helped by the calendar regardless of time of responding. The looser attachment the employee has to the labour market or the more irregular working time he has, the more problematic it gets to estimate exact working situation in a specific week, and the more helpful a calendar might be to bring the memory on track. This is probably the main explanations for the smaller overlap in employment between LFS and RSE for these groups of employees and not the least for the increased difference between and CATI. For the group of young employees and especially of students the CATI-respondents overlap with RSE was the smallest and had the biggest difference to the overlap of -respondents.