Know Your Data (Chapter 2)

Size: px
Start display at page:

Download "Know Your Data (Chapter 2)"

Transcription

1 Let s Get Started!

2 Know Your Data (Chapter 2) Now we each have a time series whose future values we are interested in forecasting. The next step is to become thoroughly familiar with the construction of the series. This is important to you in helping you prepare, interpret, and present your forecasts. Among the question you need to consider - What is it designed to measure? What does it actually measure? How is it computed and by whom? What is the data frequency? Is it seasonally adjusted? How? What are some of its limitations? Where can more information be obtained about the series? What do the data look like? (Graphs;Summary Statistics)

3 Example HEPI The Higher Education Price Index (HEPI) is an annual price index (fiscal years, ) designed to measure changes in the cost of operating colleges and universities. It was originally developed by Dr. Kent Halstead, who apparently continues to maintain it. The HEPI is constructed by : 1) specifying a fixed typical market basket of goods and services purchased by colleges and universities each year to support teaching and research activities then 2) adjusting the index from period to period according to the change in the cost of purchasing this basket. So, for example, if the cost of buying this basket increased by 3% during 2005 then the 2005 HEPI would be computed as 1.03*HEPI(2004). The current base period = 1983.

4 The basket includes over 100 items that cover various categories of expenditure: salaries/benefits of faculty, administrators, and other professional personnel wages/salaries/benefits of clerical and other non-professional staff contracted services (data processing, communication, transportation, ) supplies, materials, equipment library acquisitions utilities The primary data used to construct the index are obtained from a variety of sources including: salary surveys conducted by various national higher education associations, the National Center for Education Statistics, the American Association of University Professors, and the BLS.

5 The HEPI is published as part of an annual report by Research Associates of Washington, D.C. The full 2004 report, which provides lots of additional information about the HEPI can be found on the internet at: Limitations of the HEPI? The limitations of the HEPI as a measure of the cost of providing higher education are similar to the limitations of the CPI as a measure of the cost of living. selection of the market basket and its relevance to particular institutions at particular times quality of measurement of basket components

6 What do the data look like? We ve collected our data. We understand who produces the data and for what purpose(s), the frequency of the data, what the data are (and are not) measuring, The next step in becoming more familiar with the data is to graph the series. Graphing your data is an extremely important preliminary step in the forecasting project for a number of reasons related to the selection of the forecast method you select and/or the forecast presentation it gives you a feel for the overall behavior of the series you may see unusual features (e.g., outliers) of the series

7 Your textbook provides a very nice example to illustrate the value of graphing your data before jumping into the process of running regressions - Anscombe s quartet Anscombe s quartet refers to four data sets that were constructed by a statistician named Anscombe to make a point about the value of graphical analysis as a complement to regression analysis. Anscombe s quartet is made up of four sets of 11 observations of (x,y). The four regressions of y on x look virtually identical: The same estimated intercepts and slopes, the same standard errors and t-statistics, the same R 2 s, the same sum-of-squared residuals. However, the four scatterplots look quite different from one another. That is, the actual relationships between y and x look quite different across the data sets and suggest that the the conditional expectation of y given x is likely to be well estimated by the sample linear regression in only one of the four cases.

8 So, let s graph our data. Since we re working at this point with a single time series, the natural graph is a time series plot in which we measure time periods along the horizontal axis and we measure the value of the series on the vertical axis. Constructing a time series plot of your data from EViews 1. Open your workfile 2. Right click on your series and select open, which should produce a window with your data in table form. 3. From that window, select view then line graph.

9 Insert the hepi time series plot

10 A brief digression on logarithmic transformations When working with macroeconomic and financial time series data, it is very common to transform the original time series (say, y t ) by taking its natural logarithm (i.e., log(y t )). The reason is that experience suggests that the assumption that the conditional expectation of log(y) is linear seems to work better than the assumption that the conditional expectation of y itself is linear. A related reason that is especially relevant to trending data is that for many of these series, the magnitudes of the changes in the level of y (i.e., y t -y t-1 ) are increasing over time, while the percentage changes or growth rate in y (i.e., log(y t )-log(y t-1 )) are relatively stable over time.

11 So, in other words, the linear regression methods that provide the basis for much of the forecasting methods we use, often seem to work better if we use the log form of the series.

12 Transforming your time series into its log form using EViews Assume that the name of your series is myseries and you want to create and save a new series called logseries whose observations are is equal to the natural logarithms of myseries. 1. Open your workfile 2. Click on the Genr butto, which will open the Generate Series by Equation window. 3. In the Enter equation box enter: logseries = log(myseries) Then click OK 4. The new series, logseries, should appear in your workfile (along with the original series, myseries). Save the workfile.

13 insert the log(hepi) plot

14 Another view of the behavior of the HEPI is found by looking at its annual percentage change, i.e., the HEPI inflation rate The percentage change from period t-1 to period t can be calculated according to: % change in y between periods t-1 and t = log(y t ) log(y t-1 ) inf t

15 To create inf t in EViews Suppose that we have hepi and loghepi in our work file, where loghepi was generated from hepi according to loghepi=log(hepi). There are two ways to create the series inf t = log(hepi t )-log(hepi t-1 ). From the workfile, click the Genr button to open the Generate by Equation window. Then in the Enter Equation box, enter either or inf = dlog(hepi) inf = d(loghepi). Note that d(x) creates a new variable by taking the first differences of x, i.e., x t -x t-1 while dlog(x) create a new variable by taking the first of log x, i.e., log(x t )-log(x t-1 ). Put another way, d(x) creates a time series of the changes in x while dlog(x) creates a time series of the percentage changes in x.

16 Insert the hepi inflation plot

17 Insert the hepi vs cpi plots

18 The Sample Distribution and Summary Statistics Another useful way to summarize the data is to look at the sample distribution and summary statistics that relate to that distribution. The sample distribution is the number or proportion of times that the series takes on certain values or lies within certain intervals. A common graphical representation of this distribution is the historgram. The sample distribution can also be summarized by sample statistics, including the sample mean, median, variance, and standard deviation.

19 These are easily computed in EViews as follows 1. From the workfile, right click on the series name, then click open, which will open a Series window. 2. In the Series window, click on View then Descriptive Statistics then Histogram and Stats.