The use of scanner data: a choice between evolution and revolution of official price statistics

Size: px
Start display at page:

Download "The use of scanner data: a choice between evolution and revolution of official price statistics"

Transcription

1 Ingolf Boettcher Statistics Austria 7th International Workshop on Survey Methodology Daejeon, Republic of Korea 28. September 2016 The use of scanner data: a choice between evolution and revolution of official price statistics Wir bewegen Informationen

2 Official Statistics production: Where we come from The universe (entire statistical population) Official Statistics =30% =70% Folie

3 Official Statistics production: Where we come from The universe entire statistical population) The statistical data (sample) The statistical model / methodology (to approximate the universe) Official Statistics =30% =70% Folie

4 Official Statistics production: with large new data sources (e.g. scanner data) The universe (entire statistical population) The statistical data ( big data ) The statistical model / methodology (if necessary.?!) Official Statistics =30% =70% Folie

5 Official Statistics production: with large new data sources (e.g. scanner data) no need for statistical models? no need for theory? The universe entire statistical population) The statistical data ( big data ) Official Statistics =30% =70% Folie

6 Official Price Statistics production: Consumer Price Index Basic Formula CPI = (p t1,i q i ) (p t0,i q i ) p t1,i Source = price of product i in period t1 price collector p t0,i = price of product i in period t0 price collector q i = quantity sold of product i ( weight) market research companies, houshold budget survey, others Folie

7 Official Price Statistics production: Current methodology Data collection Lemonade: sold=?? Cola: price= 1,00 sold=?? Folie

8 Official Price Statistics production: Current methodology: Laspeyres-type index (fixed basket index use of base period quantities) CPI L aspeyres,j,t1 = (p t1,i q t0,i ) (p t0,i q t0,i ) Simplified Example Price data from price collectors CPI L,softdrinks,t1 = (p t1,cola q t0,cola ) + (p t1,lemonade q t0,lemonade ) (p t0,cola q t0,cola ) + (p t0,lemonade q t0,lemonade ) CPI L,softdrinks,t1 = 1, ,8 30 1, ,8 30 = 1,074 weight from secondary data sources for base period t0 "prices of softdrinks increase + 7, 4% from t0 to t1 Folie

9 Official Price Statistics production: Current methodology Principles of consumer price indices Use of Laspeyres-type indices with base period (period 0) quantities ( fixed basket approach ) Prices refer to individual product offers Comparability over time Representativeness of sampled individual product offers comparing like with like (Matched model method) Frequent publication (monthly / quarterly) Folie

10 Official Price Statistics production: Opportunities with scanner data sold=55 price= 0,60 sold=55 Cola XY: ColaXY: Cola XY: LeCola XY: sold=40 price= 0,40 sold=100 sold=65 Cola XY: price= 1,99 sold=53 price= 0,65 price= 0,49 price= 0,32 Cola XY: price= 0,99 Cola XY: price= 0,99 price= 0,66 price= 1,39 Folie

11 Scanner data how it looks like # Shop ID Art- Code Art. retailer classifcation Soft drinks - cola Soft drinks cola Product Description Cola, BrandX, 333ML Cola, light, BrandY, L sold Sales in EUR ( ) Bakery products Brezel, brandz, 500g Estimate for Austria (food): data sets every month= Articles X 4 Weeks X 1000 Shops X 3 Retailers Before (with manual price collection): data sets = 100 Articles X 1 (monthly collection) X 20 Cities X 5 supermarkets Folie

12 Official Price Statistics Advantages of scanner data Better temporal and spatial coverage Information on all products (not only sample) Transaction prices (not only list prices) better coverage of promotions information better weights, better sampling Lowest level (product) quantity information Folie

13 Official Price Statistics Disadvantages of scanner data Dependency on data provider (retailer) Very large data sets make high initial investments are necessary (Hardware, Software, Staff-training) Secondary data source more complex data manipulation necessary Folie

14 Official Price Statistics with scanner data Where we could go to: Paasche-type index (cost of living index use of current period quantities) CPI P aasche,j,t1 = (p t1,i q t1,i ) (p t0,i q t1,i ) Simplified Example CPI P,softdrinks,t1 = (p t1,cola q t0,cola ) + (p t1,lemonade q t1,lemonade ) (p t0,cola q t0,cola ) + (p t0,lemonade q t1,lemonade ) CPI P,softdrinks,t1 = ( 1, ,8 70 1, ,8 70 = 1,035 weights from scanner data in current period t1 (hereby including substitution effects caused by price increase) "prices of softdrinks increase + 3, 5% from t0 to t1 Folie

15 Official Price Statistics with scanner data Where we could go to: (Fisher-type superlative price index) CPI F isher,j,t1 = P Laspeyres P Paasche Simplified Example CPI F,softdrinks,t1 = (p t1,i q t0,i ) (p t0,i q t0,i ) (p t1,i q t1,i ) (p t0,i q t1,i ) CPI F,softdrinks,t1 = 1,074 1,035 = 1,054 "prices of softdrinks increase + 5, 4% from t0 to t1 Folie

16 Official Price Statistics with scanner data choices between statistical evolution and revolution Index compilation step EVOLUTION: Traditional method REVOLUTION: New method (possible only with scanner data) Index type Laspeyres vs Paasche, Fisher, etc. Lowest level aggregation of prices Unweighted vs Weighted Basket of products Fixed vs Dynamic Matching Individual product offers Folie vs Homogenous product (cluster) Publication Monthly vs Daily, weekly, monthly Sampling Best sold product vs All products / cut off sampling/ /proportional to size sampling/etc. Price definition List price vs Unit value Spatial Aggregation shop vs Shop, chain, region, city, nation, Temporal Aggregation Day vs Day, week, month

17 Official Statistics production: with large new data sources outlook methodological changes Evolutionists point of view theory/ User need Methodology don t change the methodology only because of new data sources data collection Folie

18 Official Statistics production: with large new data sources outlook methodological changes Revolutionists point of view Available data theory/ User need Existing methodology is always a result of the available data. New data sources lead to new methodologies. Methodology Folie

19 Official Statistics production: with large new data sources outlook - summary Large new data sources such as scanner data can substantially improve the quality of official statistics Large new data sources may be integrated in existing methodologies (evolution) or completely introduce new methodologies (revolution). Statistical community needs to work on methodologies that take into account characteristics of the new data sources AND the need of data users. Folie

20 Official Statistics production: with large new data sources outlook methodological changes Theory/ User need User: we need a cost of living index Quantities of current period Change of index type Available data Review of methodology Folie

21 Contact: Ingolf Boettcher Statistics Austria Consumer Price Index Guglgasse 13, 1110 Wien Tel: +43 (1) Fax: +43 (1) The use of scanner data: a choice between evolution and revolution of official price statistics Folie