Session 11: Three ways of using Administrative Data

Size: px
Start display at page:

Download "Session 11: Three ways of using Administrative Data"

Transcription

1 Course on Register-based Statistics INEGI Aguascalientes April 2011 Anders & Britt Wallgren Statistics Sweden and Örebro University Session 11: Three ways of using Administrative Data Our aim is to discuss methods that can be used when a new administrative source is evaluated Should it be included in the production system of the NSI? How should it be used within the production system? The statistical usability is analysed with a method that uses quality indicators 1

2 Quality indicators should have a clear purpose: We want to use a set of indicators to describe the production process. What parts of the process are good/bad? What parts should be improved? That was discussed in section 10 above. We want to use an other set of indicators to analyse the statistical usability of a new source: - Should it be included in the production system of the NSI? - How should it be used within the production system? 2

3 Three ways of using an Admin. Source in the Production system: Indicators say: 1. OK for Statistical Product as it is Administrative source 2. OK for Statistical Product after preparations Production System Statistical Products 3. OK for improving the Production System 1. OK for Statistical Product as it is (primary register) 2. OK for statistical product if combined with other sources (integration register) 3. OK for improving the Production System 3

4 Method to measure indicators: A. Use Information from Administrative Authorities B. Analysis and Data Editing-like analysis of the Source C. Integrate the Source with the Base Register D. Integrate with Register and Sample Surveys with Similar Variables 4

5 A. Chart Use 3. Information Indicators from of output Admin. and Authorities input data quality relevance A1 A2 Quality factor Relevance: population Relevance: units Description Definition of the administrative object set. Which administrative rules determine which objects are included? Definition of the administrative units. Are they suitable as statistical units? A3 Relevant keys Are there primary keys and foreign keys in the source that are suitable for micro integration? A4 Relevance: Definitions of the administrative variables. Are these variables suitable as variables statistical variables? A5 Relevance: reference time Are reference times suitable for statistical usage? What rules for accruing accounting data between months and years are used? A6 Study domains Are there variables describing domains in the source or can the units be linked with domain variables in the Business Register? A7 Indicator Comprehen- siveness Small/large part of an intended population? Few/many interesting variables? Small/large number of existing surveys benefit from the source? A8 Updates How often and at what time points is the administrative register updated? A9 Delivery time Time for deliverance of register from register holder to the NSI A10 Punctuality Difference in time between deliverance and agreed deliverance time point A11 Comparability over time Extent of changes in the content of the register over time 5

6 Chart 4. Indi- Quality cator factor Indicators of output and input data quality accuracy B. Analysis and Data Editing-like analysis of the Source Description B1 Primary key Fraction of units with usable identities. The primary key should have correct format and reasonable values. B2 Foreign keys Fraction of units with usable foreign keys. Foreign keys should have correct format and reasonable values. B3 Duplicates in Fraction of identities that occur more than once. Fraction of the source records with different identities but the records are otherwise identical. B4 Missing Fraction of missing values for the statistically interesting values variables. B5 Wrong values Fraction of wrong or unreasonable values for the statistically interesting variables. 6

7 Chart 5. Indicators on output and input data quality accuracy C. Integrate the Source with the Base Register Indi- Quality cator factor C1 Undercoverage in BR C2 Undercoverage in the source C3 Overcoverage in BR C4 Overcoverage in the source Description Fraction of units: There are enterprises/units that have been active during the reference period but are missing in the BR or are coded as inactive in the BR. Fraction of units: There are enterprises/units that have been active during the reference period according to the BR but are missing in the source. Fraction of units: Enterprises/units are coded as active in the BR and belong to a category that is covered by the source, but they have no reported activity in the source. Fraction of units: There are units in the source that belong to a category, or seem to belong to a category, that is not statistically relevant. 7

8 D. Integrate with Register and Sample Surveys with Similar Variables D1 Quality factor Relevance of variables Description Variables in the sources can now be compared with similar variables in other surveys/registers. D3 Relevance of An administrative i ti source may contain information that t can variables improve some parts of the Business Register (BR). Information on Industrial Activity or Sector even for only a small number of enterprises can reduce missing values in the BR. Information on how administrative units are related can improve the quality of Enterprise Units in the BR. D4 Under- Fraction of population p total by industry: Enterprises/units coverage in the BR and SBS that have been active during the reference period according to the administrative sources but are missing in the BR and the SBS. D5 D10 Indicator Overcoverage in the BR and SBS Fraction of population total by industry: Enterprises/units that are coded as active in the BR but have not reported any activity in any administrative source. These enterprises may have been treated as nonresponse in the SBS and non-zero variable values were imputed. 8

9 1. OK for Statistical Product as it is Example 1: Gross annual wages based on yearly income verifications (Case 3 discussed earlier) 9

10 2. OK for statistical product after preparations Not OK alone but OK if combined with other sources Example 2: Integrated register for yearly National Accounts Overcoverage and undercoverage in five administrative sources Source 1 Source 2 Source 3 Source 4 Source 5 Overcoverage 41% 0% 0% 0% 0% Undercoverage 21% 74% 74% 30% 9% 10

11 3. OK for improving the Production System Improving a Base Register Example 3: Monthly gross wages Overcoverage in the Business Register (BR) can be decreased If monthly gross wages = 0 during 6 consecutive months => Code Inactive in the BR Undercoverage in the Business Register (BR) can be decreased If monthly gross wages > 0 => Code Active in the BR 11

12 Three ways of using an Admin. Source in the Production system: Indicators say: 1. OK for Statistical Product as it is Administrative source 2. OK for Statistical Product after preparations Production System Statistical Products 3. OK for improving the Production System 1. OK as it is (primary register) Easiest, new countries 2. OK if combined with other sources (integration register) More difficult, Register-based census 3. OK for improving the Production System New for Sweden 12