Scanner Data and Spatial Price Comparisons: Current Status and Future Implications for International PPPs

Size: px
Start display at page:

Download "Scanner Data and Spatial Price Comparisons: Current Status and Future Implications for International PPPs"

Transcription

1 Fifty Years of International Comparison Program: Achievements and Moving Forward, Beijing October, 2018 Scanner Data and Spatial Price Comparisons: Current Status and Future Implications for International PPPs Tiziana Laureti University of Tuscia, Viterbo, Italy Member of the Governing Body of Italian National Statistical System- (COMSTAT)

2 OUTLINE OF THE PRESENTATION Background and Aims Scanner data and CPI computation Scanner data and spatial comparisons Current status The Italian experience Future Implications for International PPPs

3 Background and Aims Over the last decade there has been a growing interest in using scanner data for constructing official price indexes thus increasing the availability of this new data source. Almost a third of EU countries are currently using scanner data for compiling CPIs using different methods As yet few studies have been carried out on using scanner data for compiling spatial prices indexes (Heravi, Heston and Silver, 2003; Laureti and Polidoro, 2018, Laureti and Rao, 2018) In this context scanner data may enable countries to construct regional spatial price indexes and improve international spatial comparisons The aims of this presentation are to: Describe the current status concerning the use of scanner data Illustrate the Italian experience Envisage future implications for international PPP computations

4 Scanner data and CPI computation Country Scanner data sources Classification/linking methods Norway 2001 Switzerland 2008 Netherlands 2010 Denmark 2011 Sweden 2011 Belgium 2015 Iceland retail chains, gasoline stations, pharmacies the two largest retail store chains ( market share of about 60-70%) 6 supermarket chains (market share of around 50%) largest supermarket chains (60% of sales of food and beverages) 3 major outlet chains in Sweden +2 food chains for products sold by weight (from 2018) 3 largest supermarket chains (75-80% of the market) GTIN+PLU In-store item numbers of the retail chain EAN+item description (text mining) EAN + product description created by the supermarket chain Automatic coding +GTIN Store proprietary codes (stock keeping units SKUs) 3 largest grocery store chains Barcode (EAN) + item description

5 Scanner data and CPI computation Country Scanner data sources Classification/linking methods Italy 2018 Luxemburg 2018 New Zealand 2014 for CPI Australia largest retail store chains (95% of modern retail trade distribution) retail transaction data for food products and non-alcoholic beverages retail transaction data for consumer electronics products retail transaction data (25% of CPI) Product key number+ GTIN Information provided by Nielsen EAN Information provided by GfK Stock keeping unit (SKU) Several countries are planning to use scanner data within a few years. In fact, the NSOs are still in the research phase (e.g. France, UK, Portugal, Austria, Poland, South Africa) secondary data source NSOs must reclassify scanner data Eurostat published a practical guide for Processing Supermarket Scanner Data to help NSOs to accelerate the process of using scanner data and to ensure comparability among national HICPs (Eurostat, 2017).

6 Scanner data and spatial comparisons: current status Scanner data for spatial comparisons To date, little research has been carried out on this topic Heravi, S., Heston, A., & Silver, M. (2003). Use of scanner data for providing estimates of intercountry price parities at the level of the basic heading. The application was based on about 1 million transactions for television sets over two months in three countries Feenstra, R. C., Xu, M., & Antoniades, A. (2017). Examine the price and variety of products at barcode level in various cities in China and the US and it was observed that, unlike the US, product prices tend to be lower in larger Chinese cities. To my knowledge, only the Italian Statistical Institute (Istat) has started an official research project within the MPS framework for computing subnational price parities using scanner data (Laureti and Polidoro, 2017, 2018; Laureti et al, 2017; Laureti and Rao, 2018)

7 Spatial price comparison in Italy Sub-national PPPs for Italy are required due to the high socio-economic heterogeneity across its macro-areas Regional values of economic indicators should be adjusted for regional price differentials

8 Spatial price comparison in Italy Italy is one of the few countries that has carried out official experimental subnational SPI estimations (using CPI data and ad-hoc surveys) referring specifically to household consumption and considering regional capitals: In 2008 (with reference to 2006 data): GEKS formula, three expenditure divisions (Food and Beverages, Clothing and Footwear, Furniture); In 2010 (with reference to 2009 data): all COICOP expenditure divisions; GEKS formula and CPD model for actual rents The latest results in 2010 showed significant differences in the level of consumer prices across the regional capitals (Istat, 2010). Consumer price levels in the Northern cities are generally higher than those in the Centre and especially in Southern Italy.

9 Scanner data and spatial comparison: the Italian experience However, systematic attempts to compile regional spatial price indexes on a regular basis have been hindered by laborious calculations and data unavailability In fact, there are various drawbacks in using traditional sources of price data (CPIs, ad-hoc survey) Using scanner data may allow for the computation of SPIs on an annual basis Since 2014 scanner data have been regularly collected and provided by the market research company ACNielsen (Istat project on scanner data). CPI production process has been significantly improved: Since January 2018 Italian CPIs have been produced with scanner data Scanner data currently replace the on-field collected price data for grocery products in supermarkets and hypermarkets (from 2019 onwards data on electronic goods will be included) Use of scanner data for producing SPIs Experimental statistics

10 FIRST PHASE of the research project (Laureti and Polidoro, 2016; Laureti and Polidoro, 2017) AIMS: To explore the potential advantages of the use of scanner data for constructing sub-national PPPs (suitability of scanner data for making spatial comparisons) To deal with the empirical issues deriving from the use of this new data source DATA: Year: 2015 Product coverage: Food products Retailers: selection based on available data 931 outlets belonging to the 6 most important retail chains (Coop Italia, Conad, Selex, Esselunga, Auchan, Carrefour) covering 57% of the market Territorial coverage: 20 regional capitals Price and turnover information: 15,433 different products identified by GTIN codes

11 Potential advantages/empirical issues: GTIN/EAN Product description Brand Unit sold Volume Turnover BH City Chain Store GAROFALO SEM LUNGA SPAGHETTI N.9 SEM PASTA GR 1 SACCHETTO DE CECCO SEM LUNGA SPAGHETTI N.12 SEM PASTA GR 1 SACCHETTO GAROFALO 500GR Turin DE CECCO 500GR Venice GTIN/EAN codes provide detailed descriptions of the products. They are the same for each item at national level: Fulfil comparability requirement (like with like comparison) 2. Turnover and quantities are available for each GTIN, retail chain, outlet, and city: How to compute unit value prices? High heterogeneity of prices, across regional capitals and chains within a city: This suggests using the finest available classification of item (GTINs) We computed unit value price per item according to retail chain and outlet

12 High variability of product prices across regional capitals and chains prdkey= GTIN/EAN= Item description=«de CECCO SEM LUNGA SPAGHETTINI N.11 SEM PASTA 500 GR 1 SACCHETTO» Annual average price across Italian regional capitals

13 Potential advantages/empirical issues: How to use the available information on turnover for each item? Assessing the representativity and importance of each item thus improving the quality of SPIs This suggests that all items under a certain BH should be included and weighted according to their turnovers Few products may account for high percentages of the turnovers (e.g. pasta products) Is it possible to consider a limited number of products? One must make sure that there is a reasonable overlap in the items priced in different regions 3. Time dimension Monthly or annual average prices We estimated Time-interaction-Country Product Dummy models (TiCPD) A high variability of SPIs over time

14 Cumulative Market Share by GTIN for Pasta products: Largest to Smallest AN AO AQ BA BO CA CB CZ FI GE cumshare MI NA PA PG PZ RM TN TO TS VE Pasta products

15 WTiCPD Estimation results: PPPs for regional chief towns (Southern cities) Pasta products and couscous

16 SECOND PHASE of the research project (Laureti and Rao, 2018; Laureti and Polidoro, 2018) AIMS: To explore the feasibility of implementing various aggregation methods at BH level To estimate regional SPIs for product aggregates DATA: This dataset is used for CPI computation YEAR: 2017 OUTLETS: Stratified random sample: Universe of 9,000 retailers belonging to the 16 most important retail chains (94% of modern retail chain distribution). Sample stratified by province, distribution chains and kind of outlets (888 strata) Outlets are selected with probabilities proportional to the 2016 turnover 1,781 outlets (510 hypermarkets and 1,271 supermarkets)

17 TERRITORIAL COVERAGE: all cities within the 107 territorial areas (provinces and metropolitan towns) ITEMS 487,094 different products belonging to food, beverages and personal and home care products: five divisions of the ECOICOP (01, 02, 05, 09, 12). Scanner data cover 55.4% of the total retail trade for this category of products Items were selected with probabilities proportional to the 2016 turnover for each product aggregate (at 60% cut-off line) Chain structure in overlapping products Price concept we compute annual averages of weekly prices (average of prices paid by consumers) for each item and outlet using turnover as weights we compute provincial averages using sampling weights for each outlet Expenditure weights at item level

18 We adopted a two-step procedure similar to the one used in the ICP whereby provinces are grouped into regions: 1. Within-regional SPIs are computed by comparing price and quantity data referring to products sold in the various provinces within each region Several methods are used for this purpose at the lowest level of aggregation (groups of similar products): A. GEKS based on Jevons Index - based on products that are commonly priced in the two areas, j and k B. GEKS based on Fisher binary index using price and quantity data for commonly priced items C. Geary-Khamis Index D. Regional Product Dummy model (RPD) E. Weighted Regional Product Dummy model (WRPD) with expenditure share weights and quantity weights

19 2. between-regional SPIs are computed by using prices adjusted for difference among provinces for each region (obtained by dividing provincial prices by the Within-regional SPIs) and deflated expenditures Weighted RPD model We checked if there was a reasonable overlap in the items priced in different regions (and if overlaps exhibit a chain structure). We excluded two groups of products Whole Milk and Low-Fat Milk since there were no reliable overlaps among regions enabling spatial price comparisons Moreover, as in the ICP, sub-national SPIs (PPP) compilation is undertaken at two levels: For groups of similar products (Basic Heading, BH) Product aggregates (in our case Food and Non-Food products). Aggregation method: GEKS- Fisher (ICP and Eurostat-OECD). We standardized the GEKS-Fisher based PPPs (S-GEKS).

20 Results: Regional Spatial Price Indexes Food Products (Italy=100) Non-Food Products (Italy=100) Price levels in Southern regions are below the national average both for Food and Non-Food products, with the exception of Abruzzo ( and , respectively), Molise ( and ) and Sardinia ( and ) On average, Tuscany proved to be the less expensive region for both product aggregates (96.24 and 95.17)

21 Results: Regional Spatial Price Indexes L=54 groups of products Table 1: WRPD estimation results for Pasta products and Non-electrical appliances Italy=100 Pasta Products (BH1) Non-electrical appliances (BH2) Region Coef std.error p.value RPP Coef std.error p.value RPP North-Center PIEMONTE VALLEDAOSTA LIGURIA LOMBARDIA TRENTINO VENETO FRIULI EMILIA-ROMAGN TOSCANA UMBRIA MARCHE LAZIO South and Islands ABRUZZO MOLISE CAMPANIA PUGLIA BASILICATA CALABRIA SICILIA SARDEGNA In some BHs, the usual divide between North and South is not confirmed

22 Results: Provincial Spatial Price Indexes SPIs FOOD PRODUCTS (Tuscany=100) SPIs NON-FOOD PRODUCTS (Tuscany=100) Higher price levels: Siena (102.9) Livorno (102.2) Lower price levels: Prato (98.3) Firenze (98.4) Higher price levels: Livorno (104.0) Siena (103.2) Grosseto (103.1) Lower price levels: Prato (97.4) Pistoia (97.5)

23 Results: Provincial Spatial Price Indexes SPIs FOOD PRODUCTS (Lombardia=100) SPIs NON-FOOD PRODUCTS (Lombardia=100) Higher price levels: Brescia (101.2) Pavia (101.1) Lower price levels: Mantova (99.0) Bergamo (99.1) Higher price levels: Pavia (101.9) Como (101.5) Lower price levels: Bergamo (98.5) Sondrio (97.4)

24 Results: Provincial Spatial Price Indexes SPIs using different methods: Pasta products and coscous (Milan=100) Jevons GEKS Fisher GEKS GK RPD WE_RPD WQ_RPD Bergamo Brescia Como Cremona Lecco Lodi Monza-Brianz Milano Mantova Pavia Sondrio Varesa Lombardy s low level of heterogeneity in consumer price differences is not confirmed when considering specific food products, i.e. Pasta We observe lower price levels for household goods in relatively poorer provinces when we use Geary-Khamis method

25 Conclusions Scanner data enabled us to compute sub-national SPIs at local level to be used for adjusting regional economic indicators. The feasibility of implementing various aggregation methods has been proved but the weighted RPD model is preferable when product overlaps exhibit a chain structure. Further research is underway for Obtaining scanner data from Hard Discount, Consumer Electronics retailers and Furniture retails (planned in 2019) Integrating scanner data with other new data sources (i.e. web scraping) as well as traditional data collection (traditional retail trade) for clothing and footwear by using electronic devises

26 Future Implications for International PPPs Scanner data may enhance the accuracy of international PPPs IMPROVE THE QUALITY OF PRICING SAMPLES: Replacing on-field collected prices (NSOs may use scanner data to identify products and collect prices ) Increasing the number of products priced ( and assessing their representativity using turnover as weights) Expanding the number of cities where prices are collected (not only national capitals) It is easier to compute SAFs and sub-national SPIs (adjusting for rural/urban). Thus obtaining average national prices that are more representative of the whole country NSOs will be able to adopt probabilistic samples: Information on the universe of retailers, turnover and market share for each outlet Measures of uncertainty in price statistics

27 Future Implications for International PPPs OECD-Eurostat Program: Istat is currently carring out (October-December 2018) price surveys for clothing and footwear and for personal care products Non-electrical appliances and personal care products Non-electrical appliances Articles for personal hygiene and wellness, esoteric products and beauty products Final European list for EU group: A Articles for personal hygiene and wellness, esoteric products and beauty products A Articles for personal hygiene and wellness, esoteric products and beauty products No of SPD's No of items Spec. Brand s Well Known Brands Total L M H Not Specified Brand less Brand n.r A detailed specification for each product in the product list is provided by the SPDs NSOs should establish the most important characteristics to search for in the scanner data set The GTINs/EAN that correspond to the product specification should then be determined The product characteristics can then be matched with the itemized information contained in the scanner data (constructing record linkage procedures)

28 Future Implications for International PPPs Shampoo: SPDs and scanner data

29 DRAWBACKS: Future Implications for International PPPs NSOs rely heavily on data provided by the retailers. More IT resources are required due to the huge amount of data obtained Not all countries may have access to this type of data Scanner data my cover a limited number of product categories Various EU countries (e.g. Italy, Norway, the Netherlands) are currently using scanner data to produce international PPPs and following different procedures. They expect EUROSTAT to establish specific guidelines Further research is underway for Integrating CPI and PPP computation Exploring new methods for PPP computation using scanner data Carrying out simulation procedures to identify importance weights

30 Tiziana Laureti Thank you for your kind attention!

31 References Feenstra, R. C., Xu, M., & Antoniades, A. (2017). What is the Price of Tea in China? Towards the Relative Cost of Living in Chinese and US Cities (No. w23161). National Bureau of Economic Research. Heravi, S., Heston, A., & Silver, M. (2003). Using scanner data to estimate country price parities: A hedonic regression approach. Review of Income and Wealth, 49(1), Laureti, T., Ferrante C. and Dramis B. (2017) Using scanner and CPI data to estimate Italian sub-national PPPs, Proceeding of 49th Scientific Meeting of the Italian Statistical Society, pp Laureti, T., and Polidoro, F. Testing the use of scanner data for computing sub-national Purchasing Power Parities in Italy, Proceeding of 61st ISI World Statistics Congress, Marrakech, (2017) Laureti, T., and Rao, D. P.(2018) Measuring Spatial Price Level Differences within a Country: Current Status and Future Developments. Estudios de economía aplicada, 36(1), pp Laureti, T., and Polidoro, F. (2018) Big data and spatial comparisons of consumer prices Testing the use of scanner data for computing sub-national Purchasing Power Parities in Italy, Proceeding of 49th Scientific Meeting of the Italian Statistical Society, Palermo

32 High variability of item price across regional capitals within the same retail chain Retail chain=9, prdkey= GTIN/EAN= Item description=«de CECCO SEM LUNGA SPAGHETTINI N.11 SEM PASTA 500 GR 1 SACCHETTO» Annual average price across Italian regional capitals

33 Average price across retail chains within the same city ROME Item=«DE CECCO SEM LUNGA SPAGHETTINI N.11 SEM PASTA 500 GR 1 SACCHETTO» average price Significant differences across retail chains of annual price of the identical item (p<0.05)

34 Scanner data: % market shares (hypermarket + supermarket) year 2016 NORTH - W NORTH - E CENTER SOUTH AND ISLANDS RETAIL CHAINS PIEMONTE VALLE D'AOSTA LIGURIA LOMBARDIA TRENTINO-ALTO ADIGE VENETO FRIULI-VENEZIA GIULIA EMILIA-ROMAGNA TOSCANA UMBRIA MARCHE LAZIO ABRUZZO MOLISE CAMPANIA PUGLIA BASILICATA CALABRIA SICILIA SARDEGNA COOP ITALIA 18,2-42,2 7,9 18,0 9,1 21,3 41,2 51,2 30,8 18,5 14,3 10,0-4,4 18,6 6,9-6,3-18,5 CONAD 4,3 22,3 17,0 3,3 13,8 3,6 7,7 26,5 14,8 29,9 12,6 24,5 29,8 30,9 20,5 9,6 10,3 30,2 19,5 30,6 13,3 ESSELUNGA 12,4-3,9 31,3-1,2-9,9 22, , ,1 SELEX COMMERCIALE 17,9 8,6 4,8 9,9-32,3 9,4 6,6 1,1 22,1 18,2 3,4 2,7 23,4 7,6 29,1 6,0 3,3 4,4 12,8 11,1 GRUPPO AUCHAN 7,0-0,7 8,2-6,3 1,1 1,5 1,9 2,7 25,8 10,7 11,1-8,1 17,2 10,4 17,3 20,1 12,6 7,8 GRUPPO CARREFOUR ITALIA SPA 16,4 45,1 8,8 9,9-2,1 4,2 1,8 2,8 0,7 0,9 13,3 5,7 1,6 9,2-0,9 8,9 1,5 5,6 7,1 FINIPER 1, ,4-1,6 2,9 1, ,1-8, ,3 GRUPPO VEGE - - 1,5 1,1-6,2-0,2 0,1 0,2-0,7 2,6 5,7 20,7 1,2 5,0 4,0 19,8 13,8 3,2 GRUPPO SUN 1,4-3,2 2,6-2,0 1,2 0,3-2,4 9,8 14,4 18,2 27, ,1 AGORA' NETWORK SCARL 2,5-13,5 6,1 34,4 0,4-0,2 0, ,8 GRUPPO PAM 3,7-2,7 0,9 0,6 3,1 8,0 1,8 5,4 3,1-8,5 0,7-0,2 1, ,8 2,7 ASPIAG ,4 12,7 29,9 1, ,7 BENNET SPA 8,7-1,3 5,2-1,2 4,0 1, ,5 SIGMA 0, ,1-2,8 2,6 3,0 0,3 0,3 7,0 0,8 3,2 6,4 2,8 6,9 5,3 1,6 1,1 5,0 1,8 CRAI 1,6-0,3 0,2-2,6 2,1 0,5 0,0-0,4 1,7 0,7 0,9 2,3 0,2 5,4 3,5 7,5 9,7 1,4 DESPAR SERVIZI , , ,8 7,1 17,6 18,4 6,2 4,3 1,2 TOTAL 95,9 76,0 99,8 94,8 99,1 87,0 94,3 98,5 99,9 92,2 97,4 93,2 92,9 96,6 77,5 91,3 67,9 87,2 86,4 98,0 93,7 ITALIA

35 Product overlap across regions: Whole Milk

36 Outlets have been stratified according to provinces; chains; outlet-types. Sample size year 2016 NORTH - W NORTH - E CENTER TH AND ISLA Region Number of strata Number of outlets PIEMONTE VALLE D'AOSTA 4 7 LIGURIA LOMBARDIA TRENTINO-ALTO ADIGE VENETO FRIULI-VENEZIA GIULIA EMILIA-ROMAGNA TOSCANA UMBRIA MARCHE LAZIO ABRUZZO MOLISE CAMPANIA PUGLIA BASILICATA 6 11 CALABRIA SICILIA SARDEGNA ITALIA

37 Table 2: Gini coefficients by regional chief towns and BHs Regional chief towns Mineral or spring water Personal care products Household Cleaning and maintenance products N.Items Gini N.Items Gini N.Items Gini North Aosta Torino Genova Milano Trento Venezia Trieste Bologna Centre Firenze Ancona Perugia Roma South and Islands L'Aquila Campobasso Napoli Potenza Bari Catanzaro Palermo Cagliari

38 Laureti, and Polidoro- Big data and spatial price comparisons of consumer prices Scanner data and spatial comparison: the Italian experience Product overlap across provinces within a region: Sugar in Calabria RCPD

39 Laureti, and Polidoro- Big data and spatial price comparisons of consumer prices Scanner data and spatial comparison: the Italian experience Product overlap across regions: Pasta products