Background for Case Study: Clifton Park Residential Real Estate

Size: px
Start display at page:

Download "Background for Case Study: Clifton Park Residential Real Estate"

Transcription

1 Techniques for Engaging Business Students in the Statistics Classroom Jane E. Oppenlander Example Assignments and Class Exercises Background for Case Study: Clifton Park Residential Real Estate Data on the selling price, features, and taxes on 25 single family homes for sale in Clifton Park, NY have been collected from the website The file CliftonPark_RealEstate.jmp contains the data. The 25 homes were randomly selected from approximately 300 for sale. The available variables are defined as follows: Variable Name Selling Price Square Footage Bedrooms Bathrooms Lot Size Age Fireplaces A/C Pool Taxes Assessed Value Garage Size Definition The asking price for the home in thousands of dollars. The square footage of the home. Number of bedrooms in the home. Number of bathrooms in the home. A half bath is defined as a bathroom that has a sink and a toilet (it does not have a bath tub or a shower). The size of the lot in acres. Age of the house in years. The number of fireplaces in the home. Indicates whether the house is equipped with central air conditioning. Indicates whether there is a pool, either in-ground or above ground, on the property. The total annual property taxes. The total assessment (buildings and land) as determined in 2010 in thousands of dollars. The number of stalls in the garage. We will use this data throughout the term. You can find information on property taxes and assessment at the New York state Department of Taxation and Finance website: A four minute video entitled About Property Taxes and Assessment can be found at: 1

2 Assignment: Data Description Memo The objective of this assignment is to become familiar with Clifton Park real estate by describing the available data. The expected product of your analysis is a paper, not to exceed two pages, in the form of a memo including the following. A statistical summary of the Clifton Park real estate data. (Hint: Experiment with the JMP equivalent of Excel pivot tables, which can be found from Tables Tabulate). This will allow you to create tables of statistics for both continuous and nominal variables. Discuss any patterns or any unusual observations present in the data. Multiple Regression Case Study: Clifton Park Real Estate Develop a multiple regression model that will predict the selling price of Clifton Park single family homes using the available data (CliftonPark_RealEstate.jmp). Determine those features of the house that significantly predict selling price. For this model to not use taxes or assessed value as independent variables. Which house features are most important in determining selling price? Discuss the quality of the model and justification for excluding any outliers, if you choose to do so. Find a home that was recently listed in Clifton Park via a newspaper or website that is similar to one of those given in the data set. What does your model predict as the selling price for this home? How does it compare to the actual selling price? Find and interpret the associated prediction interval. Discuss the scope of applicability of the model. Do you think your model is sufficiently precise to be useful for realtors or home buyers? Summarize your analysis and model in the form of a technical report not to exceed three pages. 2

3 Class Exercise Introduction to Model Building: Sales and Advertising Time A company must decide how many minutes of television advertising to purchase each day. They have 15 and 30 second commercials available to air. 1) Sketch a possible relationship between the number of seconds of daily advertising and sales (in number of units) Sales (1000 units) Ad Time (sec) Give a brief description explaining the relationship. 3

4 2) Sketch another possible relationship between the number of seconds of daily advertising and sales Sales (1000 units) Ad Time (sec) Give a brief description explaining this relationship. 4

5 3) The plot below shows data from the past five TV advertising campaigns Sales (1000 units) Ad Time (sec) Sketch a model that you feel adequately represents the data. a) Use your model to quantify the relationship between advertising time and sales. b) Use your model to predict the number of units sold for 180 seconds. c) What would you expect sales to be for 300 seconds of advertising time? Do you have any reservations about this prediction? 5

6 Class Exercises: Business Applications of Statistical Models Class Exercise Simulation The managing partner of a firm that produces customized accounting software for small businesses is preparing plans for the next fiscal year. A software development project is composed of four phases requirements gathering, design, coding, and acceptance testing. Use simulation to create the distribution of total project time. 1) The managing partner estimates the following range of completion times for each of the phases of software development. Phase Completion Time Requirements gathering 5-20 days Design days Coding days Acceptance testing 5-15 days a) Create a JMP data sheet with 5 columns, one for each phase and the fifth column for total project time. Create a formula for total project time that is the sum of the times for each phase. Add 5000 rows to the data sheet. i) Assume a uniform distribution for the completion times for each of the phases. Each row will be one simulated project where the time for each phase is randomly selected from the appropriate uniform distribution. For each column create a formula and select Random Uniform from the Random Functions group. This function will randomly generate numbers from a uniform distribution on the interval [0,1]. To obtain a uniform distribution on the interval [a, b], multiply the Random Uniform function by (b-a) and add a. Do this for each of the phase of the software project. ii) The result will be 5000 simulated phase and project times. For each phase use the Distribution platform to analyze the simulated times. Does the histogram look reasonably uniform? iii) Use the Distribution platform to analyze the simulated total project time. Does the histogram look uniform? iv) Use the simulated distribution of total project times to estimate the probability that a project will last more than 100 days. v) Find the 90 th quantile of the simulated distribution of total project times. 10% of the projects will be expected to exceed that project time. b) Now create 5 additional columns and assume that the phases follow a Normal distribution. Use (a+b)/2 for the mean and (b-a)/6 for the standard deviation. To obtain a 6

7 standard normal distribution select Random Normal from the Random functions group. Multiply this function by the standard deviation and add the mean. i) Do the histograms for phases look reasonably bell-shaped? Does the histogram of the total project time look reasonably bell-shaped? ii) Find the probability that the total project time exceeds 100 days? Find the 90 th percentile of this distribution. How do these estimates compare with those obtained assuming the phases are uniformly distributed? Class Exercise Elasticity of Demand A supermarket is considering lowering the price of a dozen store-baked gourmet cookies for the months of September and October. The file cookie_demand.txt contains a randomly selected sample of price and quantity demanded over the last two years. 1) Estimate the price elasticity of the cookies using a simple regression of log(price) and log(quantity). 2) Find the 95% confidence interval on elasticity (the slope coefficient) and use it to determine whether the demand for the gourmet cookies is elastic, unitary, or inelastic. 7

8 Class Exercise Portfolio Mix In this class exercise, we will explore how variance and covariance can be used to determine the optimal mix of a portfolio containing two stocks. When probability distributions are not available for computing expected returns, variances, and covariances, they can be estimated from data. Consider the daily adjusted closing returns from the Coca Cola Company (symbol KO) and Reynolds American (symbol RAI) for three months. The dates, adjusted closing prices, and normalized closing prices are contained in the file KO_RAI_PortfolioMix.xls. 1) How do the risks, as measured by the variance (or standard deviation), of the two stocks compare? 2) Use the sample data and JMP to estimate the quantities in the table below. E(Y) Var(Y) Standard Deviation (Y) Cov(Y 1, Y 2 ) Corr(Y 1, Y 2 ) Coca Cola Reynolds American 3) Find the total variance and standard deviation for the different portfolio mixes, where a is the proportion of Coca Cola. This is most easily done in Excel. a Total Variance Standard Deviation 4) Find the optimal weighting for this portfolio. Compare this to the table you created in step 6 above. Discuss this weighting in light of the risks of the individual stocks. 8

9 5) Suppose that you incorrectly assumed that there was no correlation between the two stocks (i.e., they are independent). Complete the tables below using the assumption of independence. Compute the optimal weighting under the assumption of independence. Coca Cola Reynolds American E(Y) Var(Y) Standard Deviation (Y) Cov(Y 1, Y 2 ) 0 Corr(Y 1, Y 2 ) 0 a Total Variance Standard Deviation 6) Compare the two sets of calculations (with and without independence). What are your conclusions? 9