Chain Drift in a Monthly Chained Superlative Price Index

Size: px
Start display at page:

Download "Chain Drift in a Monthly Chained Superlative Price Index"

Transcription

1 Chain Drift in a Monthly Chained Superlative Price Index Paper written for the joint UNECE/ILOs workshop on scanner data, Geneva, 10 May Ragnhild Nygaard Statistics Norway ragnhild.nygaard@ssb.no Abstract In August 2005, Statistics Norway expanded the use of scanner data in the Consumer Price Index (CPI) to include both price and quantity information at elementary level, calculating a monthly chained Törnqvist price index for food and non-alcoholic beverages. A possible drawback with frequent chaining and superlative price indices is chain drift. To reveal possible chain drift we have compared results from a monthly chained Törnqvist price index with a GEKS price index. Assuming that the GEKS price index is a benchmark index, there are indications of chain drift in the Norwegian price index of food and non-alcoholic beverages based on scanner data. Prices and quantities oscillating due to sale periods as well as lack of imputation of missing prices and lack of satisfactory treatment of some seasonal items create a downward bias. Statistics Norway would like to thank Statistics Netherlands (CBS) for providing us with their SAS calculation program for the alternative GEKS price indices. The author is also very grateful for all the SAS programming assistance given by Ronny Haugan, Statistics Norway. 1

2 1. Introduction There is a growing interest among statistical agencies for the use of scanner data in index compilations and for the possibilities the scanner data provides. Statistics Norway is among very few statistical agencies that have included both price and quantity information received from retail scanner data in the official CPI calculation. The main advantages of scanner data are the enormous increase in the number of products and price observations and the availability of both price and quantity information in real time, in addition to the low response burden 1. For some years now Statistics Norway has been receiving scanner data in several areas, i.e. food and non-alcoholic beverages, alcoholic beverages 2, petrol and pharmaceutical products. We have used scanner data price information from retail chains in the CPI since the late 1990s, and in August 2005 we expanded the use of scanner data by exploiting both price and quantity information for all items in our selected retail outlets. Traditionally, statistical agencies don t have access to quantity information at elementary level. Scanner data provides statistical agencies with new opportunities to calculate the price indices, as both price and quantity are available. At the same time, agencies are faced with huge challenges relating to how to aggregate this enormous amount of data in the best way. Many fundamental and methodological choices were made in the process of expanding the scanner data use in the Norwegian price index of food and non-alcoholic beverages back in Some choices were rather obvious while others were more controversial. Perhaps the most controversial choice we made was to introduce a monthly chained superlative price index at elementary level including volatile and seasonal items. The major advantage with monthly chaining is the quick update of all the entries and exits of items in the retail market. According to the ILO manual (2004), superlative price indices are the best choice and an ideal framework when both detailed price and quantity information are available, but at the same time it notes that the use of chained superlative indices can lead to very biased results if there are large period-to-period fluctuations in prices and quantities. With the combination of superlative price indices, price and quantity bouncing and frequent chaining, chain drift may occur. And the more frequent the chaining is, the larger the biases can be. Due to international focus on superlative indices, monthly chaining and chain drift, Statistics Norway started evaluating the scanner data-based price index of food and non-alcoholic beverages in Limited international experience in this field and a relatively high weight 3 on food and non-alcoholic beverages in the CPI were also reasons for carrying out the evaluation project 4. In this paper we outline some experiences and some challenges encountered in the use of scanner data in estimating superlative price indices, and the paper is organised as follows. Chapter 2 presents some properties of scanner data while in chapter 3 we describe our calculation method and data cleaning process. Chapter 4 reveals biases in a monthly chained superlative price index. We examine chain drift, seasonal items and missing price observations in more detail. In chapter 5 we present some results based on a newly developed method - the rolling year GEKS price index - first presented by Ivancic, Fox and Diewert (2009), and compare these results with results from a recalculation based on our official price index method. Finally, in chapter 6 some concluding remarks are given. 1 Statistics Norway doesn t use price collectors in the CPI in the same way as most other countries. There was an increasing pressure from the retail outlets on the retail chains headquarters to send us scanner data as the response burden (filling out large questionnaires) for the outlets was rather high. Statistics Norway receives the scanner data free of charge. 2 Not from retail chains, but from the State wine and liquor monopoly. 3 Food and non-alcoholic beverages consist of 11.4 per cent of the Norwegian CPI. 4 Statistics Norway also uses scanner data for other product groups such as toiletries, detergents etc. in the official CPI figures, but only the price information on specific representative items is used. 2

3 2. Properties of scanner data The scanner data items are identified by EAN (European Article Number), an international retail product code and by chain specific barcodes, so-called PLU 5 codes. These are codes scanned in the cash registers of retail outlets when the items are bought and paid for. The data used in the Norwegian CPI is collected from the chains headquarters and contains information on price, quantity, type of outlet, location, period and description of the item. The monthly price reported is a calculated price which refers to the average price of the midweek 6 of the month. The reported quantity refers to the quantity sold in the same week. Some chains also report whether items have been sold at normal price or been on sale. In those cases where both are included a weighted average is calculated. There are four 7 major retail chains covering the Norwegian grocery market, and in total Statistics Norway receives over price observations each month covering food and non-alcoholic beverages, which are divided into about different types of items (EANs). Table 1 gives the number of data according to the COICOP classification. Table 1. Average monthly number of price observations for the period October 2008-September 2009 COICOP Prices 0111: Bread and cereals : Meat : Fish : Milk, cheese and eggs : Oils and fats : Fruit : Vegetables : Sugar, jam, chocolate, confectionary : Food Products n.e.c : Coffee, tea and cocoa : Mineral waters, soft drinks and juices Total The attrition rate of scanner data items is very high, in other words the number of items that continuously appear and disappear in the retail market is considerable. According to Rodriguez and Haraldsen (2005), almost 30 per cent of the price observations do not match after one month, while as many as approximately 60 per cent of the price observations are not common after a 12 month period. The attrition rate according to the share of turnover moderates the figures somewhat, to nearly 20 per cent mismatch from one month to the next and to almost 50 per cent mismatch after a 12 month period, which provides a strong argument for choosing monthly chaining and not a 12 month fixed base period in the calculations. The identification of items using barcodes has both strengths and weaknesses. These codes contain information on country of production, name of producer and the product itself. The producer decides on the product number. This means that by comparing prices for identical EAN codes we are guaranteed that identical items are compared. On the other hand, very similar items may have different EAN codes and are therefore treated as different items, due to either change of producers or producers 5 PLU stands for Product Look-Up or Price Look-Up, a 4-digit code for items that don t have an EAN code. Mostly used for fruit and vegetables. 6 Statistics Norway is now working to prolong the data collection period to the two midweeks. 7 Statistics Norway also receives and includes scanner data from one of the largest kiosk chains. 3

4 changing article numbers. This leads to a very high attrition rate of items and probably from a CPI point of view an overstated rate of continuously appearing and disappearing items. The PLU codes prove to be less stable than the EANs; we have experienced situations of PLU codes representing one item one month and another the next month. Yet the extent of this measurement error seems to be very small. The PLU codes also constitute a rather small share of the overall number of price observations. 3. The construction of the price index of food and non-alcoholic beverages in the Norwegian CPI 3.1 Calculation method Statistics Norway receives scanner data from a representative sample of retail outlets. The sample size is about 150 outlets and is dominated by supermarkets and discount stores 8. The sample of retail outlets is stratified by chain and concept/price profile. Analyses show that chain and profile are more important for the price development than the geographical location of the outlets (Rodriguez and Haraldsen, 2006). COICOP-6 9 classification is chosen as the lowest computation level representing rather homogenous groups like Flour, Bread, Pizza, Bacon etc. This is a detailed level of product classification and at the same time a satisfactory number of price observations are ensured. An item catalogue has been established to match the EAN/PLU codes with our COICOP-6 level. Most of the elementary price indices in the Norwegian CPI are calculated without the use of explicit expenditure weights hence calculating an un-weighted Jevons price index: J 0 P ( p, p 1 ) M m= 1 M p p 1 m 0 m. i.e. the index P for the representative item J (P J ) is a function of only the prices of the item in the base period p 0 and the current period p 1. The ratio of the simple geometric average prices is identical to the geometric average of the price ratios or price relatives, p m 1 /p m 0. It was decided that the formula at elementary level for the new price index of food and non-alcoholic beverages should be coherent with theory and previous practice; hence the price index is calculated by a weighted Jevons price index: ln P K ( p 0, p 1 i, s ) M m= 1 s i m p ln p i.e. the logarithm of the index P for the items included in the level K (P k ) is a function of the following prices: p 0, p 1, and the expenditure shares s in the period i where i= 0 and 1; We calculate a geometric Laspeyre (expenditure shares of the base period 0) and a geometric Paasche price index (expenditure shares of the current period 1) and make a geometric average of those two, treating both periods symmetrically, which will generate exactly the Törnqvist price index. The indices at elementary level (COICOP-6) are monthly chained. 1 m 0 m 8 Specialised stores like bakeries and fish stores are excluded from the price index. These stores count for less than 10 per cent of the total turnover of all food and non-alcoholic beverages. 9 COICOP-2 to COICOP-6 represent 2- to 6-digit level of COICOP. 4

5 The COICOP-5, -4, -3 and -2 indices are computed by aggregating the elementary indices using the weights from the Household Expenditure Surveys (HES) 10, i.e. the index for COICOP 01, food and non-alcoholic beverages, is given by: I 01 = N K = 1 P K w K where N represents the number of elementary indices on the COICOP-6 level, which are included in the 2-digit consumption classification while the weights from the HES are represented by w. The Laspeyre price index with annual chaining is used at higher level aggregation. Our calculation method has both strengths and weaknesses. Scanner data allows us to take into consideration both the changes in prices and the changes in quantities, and makes it possible to publish more detailed information, while analyses back in 2005 showed that the resources demanded were not much higher than if we were to calculate other alternatives. The choice of monthly chaining was a natural choice due to the high attrition rate, but there was a risk of bias applying monthly chained superlative price indices including volatile and seasonal items. Our pragmatic solutions were based on analyses comparing new and old methods over a period of several years. In 2005, we concluded that some items may be strongly affected by sales and advertising campaigns, but the price and quantity oscillation effects on consumption groups were minor. We chose to tackle these challenges by excluding single price observations with an extremely strong influence on the elementary price index and by excluding strongly seasonal items only available during certain times of the year. We also made the choice of no imputation for missing price observations. 3.2 Data cleaning process Scanner data provides huge opportunities, but at the same time statistical agencies put themselves in a vulnerable position with only a few providers of data. The Norwegian retail scanner data is however of very high quality, few mistakes occur and it is normally delivered on time. The scanner data goes through different data cleaning processes 11 before the index calculations. Using a superlative price index where both price and quantity are included, establishing good data cleaning routines is essential. The superlative price index can put a very strong emphasis on single price observations (e.g. in the case of a strong price decrease combined with a very high expenditure share), and whether a price observation is accepted and included in the price index or not can have a significant impact on the results. First we automatically eliminate month-to-month price ratios (P t /P t-1 ) of a factor higher than 3, i.e. price changes with a relative less or equal to 0.33 or higher or equal to 3 are eliminated. These price changes are judged to be unrealistic, implausible price changes 12. The amount of these observations is very small - about 100 to 150 price ratios each month. The next procedure is to calculate each and every price ratio s contribution 13 to the COICOP-6 results. All contributions outside some defined threshold of the COICOP-6 average contribution are flagged as critical observations contributing the most to the elementary level results. The amount of critical observations each month is about price observations or price ratios. The highest contributions are manually controlled and contributions that are extremely high and deviate from the rest of the data are eliminated As of January 2011, the weights will be taken from the National Account. 11 Statistics Norway uses SAS software for data cleaning and calculations. Data are stored in Oracle databases. 12 These observations may also turn out to be strong price increases or decreases of e.g. fruit and vegetables. 13 The contribution depends both on the price change and the expenditure shares of both the base and current period. 14 Only a very small amount is eliminated each month, somewhere around 20 price ratios. 5

6 Statistics Norway has chosen to exclude strongly seasonal items only available during a certain time of the year from the price index of food and non-alcoholic beverages. Strongly seasonal items like primarily Christmas and Easter items in addition to some fruit, fish and meat items are therefore eliminated from the data. The number of items classified as seasonal items varies from month to month with an increasing amount during November and December compared to the rest of the year. We classify about items as seasonal items each month outside the November-December period. During the Christmas months the amount is multiplied. 4. Biases in a monthly chained superlative price index 4.1 Chain drift The availability of both price and quantity data makes no constraint on the type of index formula to apply, and statistical agencies are faced with new challenges relating to how to aggregate this enormous amount of data in the best (unbiased) way and to find the most appropriate formula to estimate elementary price indices. Recent papers by Ivancic, Fox, Diewert (2009) and de Haan and van der Grient (2009) address some important challenges in the use of scanner data. Frequent chaining in combination with superlative-type price indices can cause chain drift in cases where both prices and quantities oscillate or bounce over time. Chain drift occurs in chained price indices when the index does not return to unity when the prices in the current period return to their levels in the base period (ILO, 2004). The shorter the chain period the greater the bias. Chain drift can occur if prices and quantities are oscillating when they are put in sales. Norwegian scanner data confirm that consumers respond strongly to sale periods with a great shift in quantity. According to Ivancic, Fox and Diewert (2009), chained superlative price indices tend to show downward drift compared to their direct counterparts when items are put in sales. When the price of an item returns to pre-sales levels, we expect the price index to show the same. Chain drift, however, may occur if it takes some time before the turnover is normalised. The consumers may stock up during sale periods, resulting in a lower turnover in post-sale periods compared to before the sales. When analysing Norwegian scanner data, the stock up effect seems to be less clear. The turnover does not seem to be systematically lower after a sale period compared to before the sales. This may be explained by the data collection period; we collect scanner data (price and quantity) from only the midweek of the month and not from all weeks of the months, i.e. we don t actually know what happens in the weeks between. We see that items on sale may return to the regular price gradually and that items may be on sale for more than one month (or for more than one midweek). The sale period is therefore of major importance, for instance, in period 0 item 1 is sold at normal price and the turnover is low. In period 1 the item is on sale and the turnover rises considerably. In period 2 the sale period continues, but now the turnover has dropped again (maybe due to the stock up effect ). In period 3 both price and turnover are back at regular levels, causing asymmetrical weighting of the price index; the price decrease in period 1 receiving a high weight, while the price increase in period 3 gets a low weight, thus causing a downward chain drift. An example from Norwegian scanner data can illustrate this. In Figures 1 and 2, the price and turnover of porridge in three different outlets in the period December 2005 to December 2006 are illustrated. All three outlets have the item on sale in May 2006 with a strong shift in turnover. In all outlets the turnover falls strongly the month after, including for Outlet 1 where the price drops even more strongly the month after and for Outlet 3, which continues to sell the item at the same discount price. Such a price and turnover pattern creates asymmetry where the price decrease weighs more than the following price increase, resulting in a downward chain drift. 6

7 Figure 1. Price. Porridge. December 2005-December ,00 35,00 30,00 25,00 20,00 15,00 10, Outlet 1 Outlet 2 Outlet 3 Figure 2. Turnover. Porridge. December 2005-December Outlet 1 Outlet 2 Outlet 3 The example given in Figures 1 and 2 illustrates some of the complexity of scanner data with many different price movements and with different items on sale at different times. Scanner data consists of an enormous amount of data and this complexity makes it difficult to always understand the direction of the price indices. We also see from our scanner data that chain drift does not always go downward. Our experience is nevertheless that the downward bias is dominating. 4.2 Seasonal items Seasonal items are commodities which are either: (a) not available in the marketplace during certain seasons of the year, or (b) are available throughout the year, but there are regular fluctuations in prices or quantities that are synchronized with the season or the time of the year, (ILO, 2004). Using monthly chaining and without imputation methods we fail to register the seasonal item prices in the price index until the month t The price change from the last month they are in-season to the first 15 This applies to all new items and temporarily missed items entering and re-entering the price index. 7

8 month re-entering the price index is missed. When the seasonal items first become available, they normally come onto the market at a relatively high price compared to the relatively low price that often prevails when leaving the market, and with monthly chaining and without imputing prices that are not captured. At the end of the in-season period, the seasonal item prices can fall considerably sometimes combined with high expenditure shares, before falling out of the price index, resulting in downward biases. To avoid these unfortunate biases, Statistics Norway excludes items not available during certain times of the year in the price index of food and non-alcoholic beverages, i.e. we exclude the most important items with strong seasonal patterns from the price index type a). Seasonal items of type b) are included and treated in the same way as any other item. Until 2009, we mainly excluded Christmas and Easter items from the price index, leaving other seasonal items included. In 2009, due to the effects that seasonal items were causing, we extended the removal of seasonal items to the most important ones within fruit, meat and fish. Statistics Norway has recalculated the price index of food and non-alcoholic beverages back to August 2005, with the removal of the most important seasonal items, and the results show that seasonal items have had a downward effect on the price development in the period 2005 to In certain COICOP- 6 groups, this downward effect is clearly demonstrated. Excluding seasonal items that represent a high share of the consumption is obviously not an optimal solution. In 2010, Statistics Norway plans to include and treat the strongly seasonal items using a more satisfactory method. 4.3 Missing price observations The price of an item may drop out of the scanner data either because the item is missing temporarily or because it has permanently disappeared from the market. Temporarily missing observations may occur for seasonal items out-of-season due to supply shortages or because the quantity bought in the data collection period, i.e. the midweek of the month, falls to zero. Permanently missing observations may occur for items on their way out of production and because new items have been introduced. We normally impute missing prices in the Norwegian CPI, but for the price index of food and nonalcoholic beverages no data imputation is applied. In Figures 3 and 4, temporarily missing prices for a certain coffee product in a specific retail outlet are illustrated. From February to March 2008, the price decreased from NOK to NOK combined with a strong rise in turnover. The next month the price falls out of the price index creating a downward effect in the coffee price index. In May 2008, the coffee product re-enters in the data, only to fall out again the following month. The item s price change is therefore not registered in the price index until August

9 Figure 3. Price. Coffee. January 2005-November ,00 20,00 18,00 16,00 14,00 12,00 10,00 8, Figure 4. Turnover. Coffee. January 2005-November Missing price observations can also create an upward bias in the price index even though the downward bias seems to be dominating. In some cases, we fail to register the price decrease perhaps due to temporary supply shortages. Instead we may register a succeeding price increase. We find some examples of this pattern in certain COICOP-6 groups. Lack of data imputation creates a downward bias in the Norwegian price index of food and nonalcoholic beverages. Recalculations of the price index by imputing missing price observations can provide us with the effects they are causing. The alternative rolling year GEKS price indices presented in chapter 5 is another way of dealing with this problem. 9

10 5. Rolling year GEKS price indices The combination of frequent chaining and superlative price indices can cause chain drift. Ivancic, Fox and Diewert (2009) have recently proposed a new method to overcome these challenges by constructing chained superlative-type price indices free of chain drift, applying a multilateral GEKS method. The method is a version of the multilateral GEKS method that is often used for price comparisons (purchasing power parities) across countries, but adjusted for comparisons over time by treating each time period as an entity. As these indices are direct indices and not chained, they are free of chain drift. The GEKS approach makes use of all the possible matches in the data during a specific period of time and calculates direct matched-item superlative price indices using each period as a base period, i.e. only price relatives of items that are bought in the two periods that are compared enter the price index. Imputation to deal with missing price observations is therefore not necessary. The GEKS price index is the geometric average of the ratios of all the direct bilateral price indices. As new data becomes available, the approach requires the figures to be recalculated; a solution that would be unacceptable in an official CPI context. In order to avoid this problem, Ivancic, Fox and Diewert (2009) have proposed a rolling year approach that uses data over a 13-month period to calculate GEKS indices - each month dropping the first period in the 13-month long window, while adding the current period. The advantage of a rolling year GEKS price index is that it allows strongly seasonal items only available during a certain time of the year to be compared. As the GEKS price indices make direct comparisons over a 13 month period, they compare seasonal prices on the way out-of-season with prices coming into season. For more detailed information of the GEKS price indices and their theoretical foundation, see Ivancic, Fox and Diewert (2009) and de Haan and van der Grient (2009). This newly developed method has not yet been applied by any statistical agencies; the method is still in a very early stage and must be classified as an experimental price index. This alternative approach is now being tested out on different countries scanner data. De Haan and van der Grient (2009) have tested the rolling year GEKS price indices over a period of several years using Dutch scanner data showing very promising results. Statistics Norway has compared results from a monthly chained Törnqvist price index of food and non-alcoholic beverages with a rolling year GEKS price index both at aggregated and detailed level. As far as we know, there have not been many studies on chain drift at aggregated level; most studies have focussed on elementary level and on different product groups, making it difficult to conclude and generalise on possible overall effects on the price index. 5.1 Monthly Chained Törnqvist Price Index vs. GEKS Price Index Comparisons at aggregated level In order to reveal possible chain drift in the Norwegian price index of food and non-alcoholic beverages we have recalculated our superlative price index and compared it to the alternative rolling year GEKS price index. Higher level aggregations are performed in the same way for both the indices. The Norwegian superlative price index is recalculated by excluding implausible price changes and important seasonal items, but including critical price observations 16. The recalculated price index can therefore deviate from the official CPI figures. We have done so in order to eliminate the manual revision effects. Both the GEKS price index and the recalculated price index are edited in the same way, i.e. we have only controlled the month-to-month price ratios and not all the different matchitem combinations during the 13-month period for the GEKS price index. This may affect the GEKS movements. It is important to underline that the manual revision eliminating price observations with very strong contributions to the COICOP-6 results in the Norwegian official price index in many cases 16 No manual data cleaning of the price observations contributing the most to elementary results. 10

11 helps to reduce possible chain drift. As the GEKS method is free of seasonal item problems, the seasonal items are included in the GEKS price index. We have also tested the GEKS price index without the most important seasonal items, to ensure comparability to the official method. It is however difficult to totally isolate the different effects and it is important to underline that the overall deviation between the monthly chained Törnqvist price index and the GEKS price index is a result of several effects - chain drift caused by price and quantity bouncing, as well as biases caused by missing price observations and unsatisfactory treatment of seasonal items included in the price index. Figure 5 shows the price development of different price indices in the period July 2006 to December Figure 5. Price indices of food and non-alcoholic beverages. July 2006-December July 2006= ,0 115,0 110,0 105,0 100,0 95, GEKS GEKS ex seasonal items Törnqvist Jevons Assuming that the GEKS price index is a benchmark price index, Figure 5 indicates biases using a monthly chained Törnqvist price index at aggregated COICOP-2 level in the period July 2006 to December There are no systematic deviations between the Törnqvist index and the GEKS index until July From there on, the Törnqvist index lies systematically below. The deviation gradually increases. The amplified gaps in July 2008 are primarily caused by the price development of fruit and the price movements of seasonal items (the ones that are not excluded in Törnqvist index). After July 2008, the gap increases even more. In February 2009, the increased gap seems primarily to be caused by an increasing number of items first on sale then disappearing from the market. The difference between the two indices amounts to 3.9 percentage points over the entire period. On an annual basis, the GEKS approach therefore increases the price development of food and non-alcoholic beverages on average by 1.1 percentage point. 11

12 Added in Figure 5 is also a modified un-weighted Jevons price index with a cut off sample 17. The aggregated deviation between the Törnqvist price index and the un-weighted Jevons price index is less pronounced and with no systematic deviations until August As of August 2008, the Törnqvist index is below the Jevons index. There are very small deviations between the two GEKS price indices, including seasonal items and without the most important seasonal items, demonstrating that the GEKS approach deals well with seasonal items Comparisons at a more detailed level At more detailed COICOP levels the price indices show different developments depending on the consumption groups. For groups like Milk, egg and cheese, Oils and fats and Fish and Food products n.e.c, the deviations between the different price indices are rather small. Also the COICOP groups covering vegetables and non-alcoholic beverages show relatively minor differences. The price development for Milk, cheese and eggs is illustrated in Figure 6. Figure 6. Price indices of milk, cheese and egg. July 2006-December July 2006= ,0 125,0 120,0 GEKS Törnqvist Jevons 115,0 110,0 105,0 100,0 95,0 90, The price development of Bread and cereals is illustrated in Figure 7. The COICOP-5 group Cakes and the COICOP-6 group Pizza are the main groups causing the bias in COICOP-4 Bread and cereals. The price development of pizza is illustrated in Figure 8 below. All price indices of pizza show quite a strong price decrease in December In January 2008 the price observations heavily 17 We have calculated a modified monthly chained Jevons price index at elementary level without explicit weights and with a cut off sample. The cut off method is based on the Dutch way of excluding unimportant items and including important items. Statistics Netherlands introduced a new method for the use of scanner data in the Dutch CPI as of January Due to the risk of bias at elementary level, they calculate a Jevons price index with a cut off sample and with other refinements (including imputation of missing prices). Items with a low expenditure share are eliminated, while items with an expenditure share over a certain threshold are included. Based on this idea, about 12 per cent of the total expenditure was eliminated, while almost 50 per cent of the price observations were excluded in the Norwegian Jevons calculations. This index is aggregated to higher levels in the same way as the official Norwegian method; using Laspeyre price indices with annual chaining.

13 weighted in the December price decrease are missing and the Törnqvist price index stays clearly below both the GEKS and Jevons price index. The missing price observations do not have an impact on the GEKS price index. The missing price observations do make an impact on the Jevons price index, but without explicit weighting the influence of those missing price observations is much smaller. The GEKS price index shows the same volatile price movements as the Törnqvist price index, but without signs of bias. Figure 7. Price indices of bread and cereals. July 2006-December July 2006= ,0 120,0 115,0 110,0 105,0 100,0 95,0 90, GEKS Törnqvist Jevons

14 Figure 8. Price indices of pizza. July 2006-December July 2006= ,0 120,0 115,0 GEKS Törnqvist Jevons 110,0 105,0 100,0 95,0 90,0 85,0 In Figure 9 the price development of fruit (COICOP-4) is illustrated

15 Figure 9. Price indices of fruit. July 2006-December July 2006= ,0 115,0 GEKS GEKS ex seasonal items Törnqvist Jevons 110,0 105,0 100,0 95, There are very small differences between the GEKS price index with or without important seasonal items as illustrated in Figure 9. Whether we include strongly seasonal items in the GEKS price index or not seems to be of less importance, as the GEKS price index manages to deal well with strongly seasonal items and compare the items over a 13-month period. In Figure 9 we also see that the Törnqvist price index seems to drift both upwards and downwards compared to the GEKS price index during the period. Even though we see both an upward and downward bias in Figure 9 above, the downward biases in the Törnqvist index are prevailing. Only few groups at detailed level show upward bias compared to the GEKS counterparts. 6. Concluding remarks In this paper, we have described the method for collecting and calculating the Norwegian price index of food and non-alcoholic beverages based on full-scale scanner data, we have analysed some important properties of the data and we have looked for biases in our price index method based on a monthly chained Törnqvist price index. Statistics Norway has received scanner data for several years and in August 2005 we changed the calculation method for the price index of food and non-alcoholic beverages to include explicit weights at elementary level. Detailed price and quantity information allowed us to calculate a superlative elementary aggregate reflecting the relative importance of the items, and monthly chaining was a natural choice to implement due to the high attrition rate of items. The Norwegian price index method has both strengths and weaknesses. The combination of monthly chaining and superlative price indices can cause chain drift. The ILO manual (2004) notes When the time period is short, seasonal fluctuations and periodic sales and advertising campaigns can cause prices and quantities to oscillate and hence it is not appropriate to use chained indices under these circumstances. We also see that the 15

16 lack of imputing missing prices and the lack of satisfactory treatment of some seasonal items contribute to downward biases in the price index. The newly proposed drift free rolling year GEKS calculation method seems to tackle all the aforementioned issues, at the same time giving consideration to changes in both prices and quantities. As the rolling year GEKS price indices make direct comparisons between all possible combinations during a 13-month period they are free of chain drift and free of seasonal item problems. In this paper we have compared our official calculation method, a monthly chained Törnqvist price index, with alternative GEKS calculations. If we consider the GEKS price index as an ideal price index or a benchmark price index there are some strong indications of biases in the Norwegian scanner data price index of food and non-alcoholic beverages in the period July 2006 to December 2009 at both aggregated and detailed level. The GEKS method provides some very interesting and promising results. It is indeed still an experimental price index and no statistical agencies have yet incorporated this method into their official CPI calculations. Despite strong empirical evidence, Statistics Norway will not change the calculation method and implement the GEKS price indices right away. We need more experience with this method ourselves, more international experience and an acceptance of this method as international good practice. It is also important to consider the more practical aspects of this method, such as how to pass this method on to the users of the statistics, how to establish good data cleaning routines, and an IT system that supports the more complicated calculation routines. The GEKS method is nevertheless a very convincing method and maybe in the not too distant future it will be established as international good practice. In the short term, Statistics Norway can make other improvements to reduce the biases that are present in the price index, such as implementing imputation for missing prices and implementing a more satisfactory treatment of the seasonal items. 16

17 References Haan, J. de and Grient, H. van der (2009): Eliminating Chain Drift in Price Indexes Based on Scanner Data, article presented at the Ottawa meeting, Neuchâtel, Switzerland, May Haan, J. de and Opperdoes, D. (1997): Estimation of the Coffee Price Index Using Scanner Data: The Choice of the Micro Index, article presented at an international working group for price indices, Voorburg, April Henriksen, K. (2006): Utvalgsplan til konsumprisindeksens nye matvareindeks Basert på strekkodedata. Only in Norwegian. Notater, Statistics Norway. ILO, International Labour Office (2004): Consumer Price Index Manual: Theory and Practice, expanded version of Consumer Price Indices: An ILO manual (1989). Ivancic, L., Fox, K.J. and Diewert, W.E. (2009): Scanner Data, Time Aggregation and the Construction of Price Indexes, article presented at the Ottawa meeting, Neuchâtel, Switzerland, May Johansen, I. and Nygaard, R. (2009): Chain drift in the Norwegian CPI?, not yet published, Statistics Norway. Rodriguez, J. and Haraldsen, F.: The use of scanner data in the Norwegian CPI: The new index for food and non-alcoholic beverages, Economic Survey 4/2006, Statistics Norway. Rodriguez, J. and Haraldsen, F.: The use of scanner data in construction elementary aggregates for food and beverages ideas and experiences from Statistics Norway, unpublished report,