LESSON NO. 5. Sales-Based Modeling

Size: px
Start display at page:

Download "LESSON NO. 5. Sales-Based Modeling"

Transcription

1 LESSON NO. 5 Sales-Based Modeling Assigned Reading 1. UB Real Estate Division BUSI 444 ourse Workbook. Vancouver, B: UB Real Estate Division. 2. UB Real Estate Division Advanced omputer Assisted Mass Appraisal. Vancouver, B: UB Real Estate Division. hapter 8: Specifying Sales omparison Models hapter 9: Important Tools for Model alibration 3. ity of algary Assessment Department. "Regression Modeling in algary A Practical Approach". Assessment Journal. Vol. 5, Num. 4, August International Association of Assessing Officers. Recommended Reading 1. UB Real Estate Division BUSI 344 ourse Workbook. Vancouver, B: UB Real Estate Division. Lesson No. 8: omprehensive Model Building Data Screening and Testing. 2. UB Real Estate Division Advanced omputer Assisted Mass Appraisal. Vancouver, B: UB Real Estate Division. hapter 11: Sales Analysis and Mass Appraisal Performance Evaluation hapter 12: Statistical Procedures and Performance Evaluation 3. Gloudemans, R. J "omparison of Three Residential Regression Models: Additive, Multiplicative and Nonlinear". Assessment Journal. Vol. 9, Num. 4, July/August A good discussion of regression analysis principles and various methods of including location variation in model. 4. Todora, J. and Whiterell, D "Automating the Sales omparison Approach". Assessment Journal. Vol. 9, Num. 1. January/February Wayne Moore "Performance omparison of Automated Valuation Models". Journal of Property Tax Assessment and Administration. Volume 3, Issue 1, Learning Objectives After completing this lesson, the student should be able to: 1. complete in-depth data screening and exploratory data analysis, building on the preliminary screening techniques demonstrated in Lesson 2, including transforming variables as necessary; 2. apply the multi-step process for regression modeling to specify, calibrate, and test a sales-based valuation model; 3. test the performance of the valuation model using a variety of parametric and non-parametric approaches, and make any necessary refinements; 5.1

2 Lesson No discuss the importance of testing the influence of individual variables not included in the final specified model; and 5. test the model using a hold-out sample (sales data not used to develop the model) to determine its suitability for general valuation. Instructor's omments Let's take stock of our progress through BUSI 444 so far. In Lessons 1 and 2, we reviewed data screening and data revelation techniques using property inventory and sales data. In Lesson 3, we investigated several methods to estimate land values and in Lesson 4 we estimated market values under the cost approach. By now, you should feel comfortable using PASW/SPSS for data screening and analysis functions. We are now ready to progress to the challenge of developing a sales-based additive regression model. In this lesson, we will revisit the Midsize sales database from Lesson 2 to illustrate the steps in building an additive model. The model will be based on direct sales comparison, using a database of market sales. However, the Midsize database was originally developed for use in a cost-based model, so it has a much higher number of inventory related variables than a typical sales-based model. As a result, we will label the model type as a "costspecified market approach". This model type was discussed by Richard Borst, a well-respected AMA expert, at a GIS/AMA conference held in Borst describes the model as a "Transportable ost Specified Market Approach". It is "transportable" because it can be used almost universally, applicable in a wide variety of contexts. It is "costspecified" because it uses primarily the same land and improvement variables that would be used in a cost model. However, it is a "market approach" in that it is calibrated using multiple regression and market sales data. In his 30 years of modeling experience, Borst has concluded this to be an effective model for use in many situations. It is easily understood by appraisers and the general public as it contains most of the property features that are considered to affect value. In this lesson we will use the final version of this database from Lesson 2, Midsize700.sav. The advantage of using this data is that it has already been refined through preliminary data screening, data exploration, and transformation in Lesson 2. We have confirmed the data does not contain any fundamental flaws or issues and is suitable for development of an additive linear regression model. Database for this lesson: we will be continuing with the "Midsize700" database from Lesson 2. If you wish to work with a fresh database, you can download the "Midsize700" database from the course webpage. However, this is not required, since this is the same database we saved at the end of Lesson 2 you can simply continue with the version you saved. Steps in the Model Building Process Building an additive regression model requires a multi-step process. In each step, the analyst must apply consistent methods and be prepared to support all assumptions and decisions. As we will see later in this lesson, there are significant model performance risks associated with certain assumptions and decisions. The key for the analyst is to find the right tools to identify, quantify, and minimize these issues. Model building can be described as three general activities: model specification, calibration, and testing. We will cover each of these activities in some depth, following the 9-step process below. 5.2

3 Sales-Based Modeling 1. Describe an appropriate general model to use and state this model using standard mathematic symbols. 2. Review the variables in the database and identify those which are suitable for this type of model. 3. Examine the potential independent variables for relationships with each other and with the dependent variable using graphic analysis, cross tabs, and correlation analysis. 4. (a) reate any transformations necessary to make variables suitable for the chosen model structure. (b) reate any additional transformations required to remove problems of multi collinearity identified in Step Repeat Step 3 with new variables. 6. List a final group of potential variables for calibration. 7. alibrate the model using an appropriate method. 8. Test and evaluate the model for use. 9. State your conclusions as to model quality. Steps 1 to 6 are model specification: selecting the variables to be considered and defining their relationships to value and to each other. Step 7 is calibration, where you determine the regression equation to value the properties. Steps 8 and 9 are testing and reporting values. One caution before we begin: model building is not simply a mathematical exercise. The modeler must apply appraisal judgement throughout the process of specifying the initial model, calibration, and testing. This adds a degree of subjectivity to the process, which means you cannot specify ironclad decision rules that are universally applicable. Just as in single-property appraisal, there is an element of art to temper the apparent hard science in statistical modeling. Step 1 hoose a Valuation Model The first step in the regression model development process is to specify the model in a general mathematical form. The general model form we will use is as follows: where is the product of the general qualitative components; is the product of the building qualitative components; is the product of the land qualitative components; is the sum of the building additive components; is the sum of the land additive components; and is the sum of the other additions additive components. Since we have additive components for land and improvements, we have essentially specified a cost approach regression model. This must be restated in a format consistent with an additive model as follows: 5.3

4 Lesson No. 5 where is the sum of the general qualitative factors in binary form; is the sum of the building qualitative factors in binary form; is the building qualitative factors in multiplicative form; is the building additive factors; is the land qualitative factors in binary form; is the land qualitative factors in multiplicative form; is the land additive factors; is the other building qualitative factors in multiplicative form; and is the other building additive factors. Step 2 Variable Review In order to use this form of model, we need to review the available independent variables and sort them into appropriate categories. We will catalogue the pool of variables in the Midsize700 database into the following six major variable types. General Qualitative General qualitative variables are intended to capture general locational influences, including neighbourhood. Neighbourhood variables reflect a range of influences associated with a broad geographic area. Location variables reflect more specific positive and negative influences. Typical positive location factors might include parks, greenbelts, and schools. Negative locational influences for single family residential uses include high traffic, noise, or proximity to commercial or industrial uses. Nbhd is the only general qualitative variable in our database, accounting for locational influences. Building Quality Building quality variables include construction quality, depreciation or age factor, and construction type such as ranch style, split level, and two storey. Where construction quality has been scaled, it becomes a multiplicative variable. Similarly, overall depreciation or simply physical depreciation will be multiplicative if calculated as the percentage of remaining life. The Midsize700 variables which fall into this category are as follows: Manuallass Lin_mancls EffectiveYear This is a nominal variable. In Lesson 2 it was transformed to a scaled variable. It can also be expressed as a binary variable. This is the scaled version of Manual lass. This variable is multiplicative and must be combined with related additive variables such as floor areas to be used in an additive model. Effective age is the age of the property based on the observed amount of depreciation from all sources, including physical deterioration and obsolescence. Effective age may not be equivalent to chronological age. For example, renovation and restoration will reduce the effective age of property. This numerical variable can be used directly in the model, converted to age, or used to develop a multiplicative depreciation variable. 5.4

5 Sales-Based Modeling Building Additive Building additive variables include finished floor areas, bathrooms, bedrooms, fireplaces, or any other building feature that is developed by measuring or counting. The following are the additive variables in the original database (in Lesson 2 a number of new variables were created based on these variables): Foundation FinishedArea Stories FullBath ThreeQtrBath HalfBath Bedrooms MultiarGarage SinglearGarage arport Pool OutBuildings Fireplcs BasementTotalArea BasementFinishedArea DeckAreaovered DeckAreaUncovered Land Qualitative Land qualitative variables represent factors such as view, waterfront, topographic features, and level of services. These variables are often binary, indicating the presence or absence of the feature. Other qualitative modifiers for residential land are multiplicative, such as size adjustment factors. Residential land is generally affected by the economic principle of diminishing returns. In other words, as the size of a residential lot increases beyond a certain point, the unit value or price per square foot or front foot tends to decrease. Binary land quality variables in the database include: ornerlot PrimeView GoodView FairView Land Additive The only land additive variable is LotSizeSqft. Other Building Qualitative and Additive Other qualitative and additive building factors are represented by the same types of feature as for main building. These include: MultiarGarage SinglearGarage arport Pool OutBuildings 5.5

6 Lesson No. 5 Steps 3, 4, and 5 Examine Potential Independent Variables and Transform as Necessary We already completed much of the exploratory data analysis and data transformation required for an additive regression model in Lesson 2. In this section, we investigate the need for additional transformations for land and building qualitative factors. The only general quality variable in the midsize700.sav database is Nbhd. General practice in an additive model is to avoid including this factor since Nbhd captures many of the features that would normally be included individually in the model, such as age, quality, and land characteristics. In other words, there is a risk of double counting if variables with overlapping influences are included in a model. To address this issue, common modeling practice is to first calibrate the model with no location variable, and then make subsequent location adjustments using the Nbhd variable factors or response surface analysis. 1 Near the end of this lesson, we will demonstrate how variables excluded from the model, such as Nbhd, can be examined for potential influences and the model adjusted for these. Building quality can be represented in several different ways, as follows: Lin_mancls as a quality factor (multiplicative); EffectiveYear, as a factor representing depreciation; or a new variable, EffectiveAge, created by transformation of EffectiveYear A more complete analysis is needed to determine which approach will result in the best variable. However, EffectiveAge is usually preferable since this factor accounts for the different size and quality of buildings rather than a constant dollar amount based only on age. Another approach is to develop a physical depreciation variable. However, in this case we do not have sufficient data to create depreciation relationships for each manual class. We now turn our attention to the land variables. As noted earlier, the unit value of residential land typically follows a non-linear relationship with increasing size. In other words, each square foot of a very large parcel of land is often worth less than each square foot of a smaller parcel. This is because the usefulness of additional units of land varies depending on how much land is already included in a parcel. A very large parcel would benefit less from an additional unit of land than would a very small parcel. To account for this relationship, a size adjustment is required. In Lesson 3, we investigated several methods for accounting for this non-linear size relationship. All of the methods depended on the creation of a land residual, because a separate land value is a necessary part of a cost model. For our sale-based regression model, a separate land value is not needed, so the following formula will be used (this method is also used later in the nonlinear regression lesson). 1 Response Surface Methodology (RSM) is a collection of statistical and mathematical techniques useful for developing, improving, and optimizing processes. RSM is particularly useful where several variables potentially impact one performance measure or quality characteristic, known as the response. For example, Lin_mancls and EffectiveYear both influence building value. 5.6

7 Sales-Based Modeling We will following the three steps below to create a size adjustment: Step 1: reate a size factor by dividing the mean property size of 7,388.2 square feet by each LotSizeSqft value. Step 2: Take the square root of the outcome from Step 1. Remember that raising a variable to the power of 0.5 is the same as taking the square root. Step 3: Raise the result from Step 2 by E=1.2 (i.e., to the exponent 1.2). This E=1.2 value was found by trial and error, trying different exponents in a plot of sizefact against LotSizeSqft until we found the expected or desired curve relationship between value and lot size. This process will not be shown here. These steps create a scaled adjustment factor that can be applied to properties with greater or lesser square foot areas than the average lot. We will create a new variable adjlotsize with the following syntax commands: OMPUTE sizefact = (7388.2/LotSizeSqft)**(1.2*0.5). OMPUTE adjlotsize = sizefact*lotsizesqft. These results are shown in the graphs displayed above. The graph of sizefact and LotSizeSqft shows an excellent curve shape, but the curve in Adjlotsize and LotSizeSqft does not level off at the higher values as is usually expected. This may result in over-valuation of the larger lots. We created a variable for finished upper floor area in Lesson 2. In a cost-specified model, it is generally assumed that not all finished area above the basement will have the same unit value; for example, first floor finished area normally has a higher square foot value than second floor finished area. Therefore, if the first floor area is not valued separately in the model, the first floor area will be under-valued while the second floor area will be over-valued. Our hypothesis is that it is better to attempt to differentiate between the value of the first floor area and the second floor area in the model than to model only the total finished floor area for all floors. We will apply the following transformations to create an area variable for each floor: OMPUTE flrarea1 = upperfinish / Stories. OMPUTE flrarea2 = upperfinish! flrarea1. 5.7

8 Lesson No. 5 These transformations assume that 1.5 storey homes have a second floor area exactly 50% of the first floor and 2 storey homes have a second floor area equal to the first. We can test the final value after modeling is completed using the storey binary variables. Floor area can now be represented in the model by either the total finished floor area or the two separate floor areas. Bathrooms can be represented by the three separate variables in the original data or the new totalbath variable created in Lesson 2. In order to include the building quality factor Lin_mancls in the model, we must transform this variable by multiplying it with the floor area variables: OMPUTE lin1area = Lin_mancls * flrarea1. OMPUTE lin2area = Lin_mancls * flrarea2. Some of the nominal variables, such as Foundation, Pool, and Stories, were transformed in Lesson 2 to a format that can be used in the model. However, OutBuildings was not transformed. A simple recoding transforms the OutBuilding values from a Y or N "string" or text to numeric binaries. The syntax for this transformation is provided below: REODE OutBuildings ("Y"=1) (else=0) into Outbldgbin. The next transformation creates the effective age variable, which will serve as a proxy for physical and functional depreciation. Since the valuation base is 2006 and no properties were built later than 2005, we can create an effective age variable, effage, as follows: OMPUTE effage = 2006! EffectiveYear. Later in the lesson, we will illustrate how an additive depreciation variable can be created using effective age and economic life relationships. Step 6 List Variables for alibration Based on the results of the first five steps in the model building process, the following variables are considered for calibration: Bedrooms MultiarGarage SinglearGarage arport OutBldgbin ornerlot adjlotsize Fireplcs DeckAreaover ed DeckAreaUncov ered poolyes effage totalbath story15 story20 linfinarea lin1area lin2area linbsmtfin totalbath FullBath ThreeQtrBath HalfBath crawl partbsmt slab 5.8

9 Sales-Based Modeling The three view variables were tested using rosstabs and only FairView had any observations with a "Y" coded; there were only two sales with a fair view. This is too few for the variable to be of any use. The minimum required is five sales with the feature or five without it. We will use multiple regression analysis to test these variables and find the combination that best explains the variation in sale prices. In order for the regression process to work effectively, obvious multicollinearity must be avoided. For example, the separate floor area variables cannot be used with total finished area nor can the separate bath variables be used with total baths. In addition, when a group of binary variables represent all potential values of the original variable, as with the foundation types, one of the group must be omitted to act as a reference or control variable. Full basement will be omitted for this reason. We will combine the variables into four groups, using different floor area and bathroom combinations, and determine which combination is optimal for further analysis and model calibration. Each group will contain the following 17 common variables: adjlotsize, slab, poolyes, partbsmt, SinglearGarage, arport, DeckAreaovered, Fireplcs, crawl, DeckAreaUncovered, Bedrooms, MultiarGarage, effage, Outbldgbin, story15, story20, and corner. The groups will be "customized" with the following variables: Group 1 Bath variables: Floor area variables: Group 2 Bath variables: Floor area variables: Group 3 Bath variables: Floor area variables: Group 4 Bath variables: Floor area variables: HalfBath, ThreeQtrBath, FullBath linfinarea totalbath linfinarea totalbath lin1area, lin2area, linbsmtfin HalfBath, ThreeQtrBath, FullBath lin1area, lin2area, linbsmtfin These variable groups meet the criteria specified in the model specification process and follow sound appraisal judgement. We will now test each group to select the best complete set of variables for model calibration. Keep in mind that we may need to retain one or more variables to aid the model's explainability from an appraisal perspective even though regression diagnostics indicate they provide little or no statistical benefit. We will evaluate the performance for each group with the following statistics: R 2, adjusted R 2, SEE, and F. The best model will be the one with the highest adjusted R 2 and lowest SEE. However, since multicollinearity is a concern, we also need to pay attention to the VIF statistic. All four groups were tested using the "Enter" method for multiple regression. To follow along in PASW/SPSS, complete the following commands: Select Analyze Regression Linear... Select Adj_Price as the dependent variable. Select all common variables and the unique variables for Groups 1-4 as the independent variables. 5.9

10 Lesson No. 5 Method should be "Enter". lick Statistics... and select Estimates, Model fit, Descriptives, ollinearity diagnostics. ontinue OK to run the regression. Only Group 1 results are shown below, with the other groups summarized in the following table. Group 1 Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate a a. Predictors: (onstant), DeckAreaUncovered, poolyes, partbsmt, corner, SinglearGarage, HalfBath, slab, story15, ourbldgbin, adjlotsize, arport, Bedrooms, ThreeQtrBath, Fireplcs, crawl, DeckAreaovered, story20, MultiarGarage, linfinarea, effage, FullBath ANOVA b Model Sum of Squares df Mean Square F Sig. 1 Regression 4.143E E a Residual 1.059E E9 Total 5.202E a. Predictors: (onstant), DeckAreaUncovered, poolyes, partbsmt, corner, SinglearGarage, HalfBath, slab, story15, ourbldgbin, adjlotsize, arport, Bedrooms, ThreeQtrBath, Fireplcs, crawl, DeckAreaovered, story20, MultiarGarage, linfinarea, effage, FullBath b. Dependent Variable: Adj_Price oefficients a Unstandardized oefficients Standardized oefficients ollinearity Statistics Model B Std. Error Beta t Sig. Tolerance VIF 1 (onstant) adjlotsize slab poolyes partbsmt Single ar Garage ar Port Deck Area overed Fireplcs crawl Deck Area Uncovered Bedrooms Multi ar Garage effage Outbldgbin story story corner Half Bath ThreeQtr Bath Full Bath linfinarea

11 Sales-Based Modeling The following table summarizes the general regression statistics for all groups. Model Summary ANOVA Group No. Adj R 2 SEE F Sig In comparing models, the most important statistics are the adjusted R 2 and SEE. At this stage of the analysis, the best model is the one with the highest adjusted R 2 and lowest SEE. All four models are very similar for these statistics. The F statistic measures performance of the overall model when compared to the result that would be obtained by estimating the sale price by simply using the mean sale price. With the large number of sales used here, and the relatively small number of variables in the model, the F value is about what would be expected. The Sig. should be less than.05. Here, the F statistics for all groups are well above 4.0, giving us confidence that the model is significant in predicting sale price at the 95% confidence level. Digging a little deeper, the VIF statistics indicate some issues with multicollinearity. Most of the VIF values are below the threshold, but some have very high VIFs, indicating extreme multicollinearity. A review of the statistical output for extreme VIF values reveals story20 and lin2area in groups 3 and 4. In addition, there are problems with the bath, main floor area, and effage variables in all groups. Multicollinearity Many statistical software packages will produce statistics which help identify multicollinearity at the time of running a regression process. These statistics are the Tolerance and the Variance Inflation Factor (VIF). As VIF = 1 Tolerance, only one needs to be examined. A Tolerance (VIF) statistic is calculated for each independent variable included in the model; for a given independent variable, the Tolerance is (1 R 2 ), where R 2 is the correlation between the given variable and the rest of the independent variables. If R 2 is zero (that is no correlation is present between the given independent variable and the remaining independent variables) then the tolerance is 1 (maximum value). As R 2 ranges between 0 and 1, the minimum Tolerance would be zero. A Tolerance value of 0.3 or less (VIF greater than 3.333) can indicate that multicollinearity exists in the model. In that case, the independent variables in the model should be examined and new models should be tried removing one or more variables to eliminate the problem. Our next steps will involve removing variables to see if the multicollinearity can be reduced or eliminated. Appraisal judgement tells us that effective age and floor area are both very important valuation variables, so we will try removing the bath variables first. This leaves two variable groups to examine: Group 5: all the common variables, plus lin1area, lin2area, and linbsmtfin Group 6: all the common variables, plus linfinarea 5.11

12 Lesson No. 5 Group 5 Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate a a. Predictors: (onstant), lin2area, poolyes, DeckAreaUncovered, corner, story15, slab, partbsmt, adjlotsize, ourbldgbin, SinglearGarage, arport, linbsmtfin, Fireplcs, DeckAreaovered, crawl, lin1area, Bedrooms, MultiarGarage, effage, story20 ANOVA b Model Sum of Squares df Mean Square F Sig. 1 Regression 4.159E E a Residual 1.044E E9 Total 5.202E a. Predictors: (onstant), lin2area, poolyes, DeckAreaUncovered, corner, story15, slab, partbsmt, adjlotsize, ourbldgbin, SinglearGarage, arport, linbsmtfin, Fireplcs, DeckAreaovered, crawl, lin1area, Bedrooms, MultiarGarage, effage, story20 b. Dependent Variable: Adj_Price oefficients a Unstandardized oefficients Standardized oefficients ollinearity Statistics Model B Std. Error Beta t Sig. Tolerance VIF 1 (onstant) Bedrooms Fireplcs crawl partbsmt slab story story poolyes adjlotsize effage ourbldgbin MultiarGarage SinglearGarage arport corner DeckAreaovered DeckAreaUncovered linbsmtfin lin1area lin2area a. Dependent Variable: Adj_Price 5.12

13 Sales-Based Modeling Group 6 Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate a a. Predictors: (onstant), linfinarea, slab, poolyes, adjlotsize, corner, partbsmt, story15, ourbldgbin, SinglearGarage, arport, DeckAreaovered, Fireplcs, crawl, DeckAreaUncovered, Bedrooms, story20, MultiarGarage, effage ANOVA b Model Sum of Squares df Mean Square F Sig. 1 Regression 4.130E E a Residual 1.073E E9 Total 5.202E a. Predictors: (onstant), linfinarea, slab, poolyes, adjlotsize, corner, partbsmt, story15, ourbldgbin, SinglearGarage, arport, DeckAreaovered, Fireplcs, crawl, DeckAreaUncovered, Bedrooms, story20, MultiarGarage, effage b. Dependent Variable: Adj_Price oefficients a Unstandardized oefficients Standardized oefficients ollinearity Statistics Model B Std. Error Beta t Sig. Tolerance VIF 1 (onstant) Bedrooms Fireplcs crawl partbsmt slab story story poolyes adjlotsize effage ourbldgbin MultiarGarage SinglearGarage arport corner DeckAreaovered DeckAreaUncovered linfinarea a. Dependent Variable: Adj_Price The results are summarized in the table below: Model Summary ANOVA Group No. Adj R 2 SEE F Sig

14 Lesson No. 5 The R 2 and SEE statistics for Group 5 are very similar to Group 6, meaning the models for both groups appear to be reasonable predictors of sale price. Moving to the individual variable statistics, there is still considerable multicollinearity. Group 5 has extreme results with story20, effage, lin1area, and lin2area. Group 6 has high multicollinearity with effage and linfinarea but not as extreme as in Group 5. We will continue our testing with Group 6. In group 6 we see that linfinarea and effage have VIFs above the critical value, but only marginally. Both variables have Sig. values of.000, indicating a high likelihood they are important to the model. Because they are not extreme VIFs and our appraisal judgement indicates they are important, we will leave these variables in for now, and revisit their possible multicollinearity again later in the testing process. The next issue we need to tackle is a series of unexpected (and in some cases negative) coefficients for garages and decks. For example, our appraisal sense tells us that a single car garage should add value rather than detract. We need to draw upon appraisal knowledge to solve the dilemma of the illogical garage and carport coefficients. Let's assume that past experience indicates that multi-car garages are worth approximately 1.75 times a single car garage and carports are worth approximately 0.3 times a single car garage 2. Similarly, experience shows uncovered decks are worth approximately 0.75 as much as covered decks. Transformations will be run to reflect these relationships and see if the model result improves. OMPUTE GARAGES = (1.75*MultiarGarage+SinglearGarage+.3*ARPORT). OMPUTE DEKS = (DeckAreaovered+.75*DeckAreaUncovered). We replace the MultiarGarage, SinglearGarage, ARPORT, DeckAreaovered, and DeckAreaUncovered variables with GARAGES and DEKS. This will be Group 7, with the following regression results. Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate a a. Predictors: (onstant), decks, poolyes, corner, partbsmt, adjlotsize, slab, garages, story15, ourbldgbin, Bedrooms, Fireplcs, crawl, story20, effage, linfinarea ANOVA b Model Sum of Squares df Mean Square F Sig. 1 Regression 4.115E E a Residual 1.087E E9 Total 5.202E a. Predictors: (onstant), decks, poolyes, corner, partbsmt, adjlotsize, slab, garages, story15, ourbldgbin, Bedrooms, Fireplcs, crawl, story20, effage, linfinarea b. Dependent Variable: Adj_Price 2 Model calibration assumptions should be based on consistent valuation research. The explainability and credibiltiy of a model will be weakened if the analyst cannot provide empirical evidence for key assumptions. 5.14

15 Sales-Based Modeling oefficients a Unstandardized oefficients Standardized oefficients ollinearity Statistics Model B Std. Error Beta t Sig. Tolerance VIF 1 (onstant) Bedrooms Fireplcs crawl partbsmt slab story story poolyes adjlotsize effage ourbldgbin corner linfinarea garages decks a. Dependent Variable: Adj_Price In this model we find the following: slight decline in R 2 and increase in SEE statistics; Bedrooms have an unexpected negative coefficient and low magnitude; except for linfinarea the relative magnitude of the other coefficients meets our expectations; Decks have a t-statistic of.243 and a Sig. value of 0.808, indicating a high probability that decks are not significant in the model; Outbldgbin also has a very low t-statistic of.073 and a Sig. Value of.942 indicating a high probability that outbuildings are not significant in the model; the highest Beta value is for linfinarea, which is a combination of quality and total floor areas. This result is expected since our appraisal sense tells floor area should contribute most to value and hence be most significant; and the lowest Beta values are those for DEKS and Outbldgbin. These also have high Sig. values, meaning they contribute little to the predictive ability of the model. orner and partbsmt also have marginal Sig. results, at.255 and.393, respectively. However, we will leave these variables in the model for now, for testing in stepwise regression. Before we proceed to Step 7, Model alibration, we must first separate our data into model and test databases. The test database will be used later in Step 8, Model Testing. The general practice when constructing a model with a large number of sales is to hold out a small proportion of the sales so that there are some sales to act as an unbiased test of the model quality. The goal is to test performance of a model by comparing the estimates produced by it to actual sales observations that were not used in creating the model. The test database is often called a "holdout sample". The model should only be tested in this manner if there are sufficient sales remaining in the model database to calibrate the model. As few as 30 sales may be sufficient to calibrate a simple model. However, typical practice is to ensure at least 5 sales for each variable in the model for statistically reliable outcomes. The following heuristics or "rules of thumb" are generally applied in model calibration: at least 30 sales will generally be required to calibrate a simple model; and there should be at least 5 sales for each variable in the model. 5.15

16 Lesson No. 5 Since the midsize700.sav database contained 54 original variables, the desired minimum number of sales for calibration is 270. We will hold out 200 sales for the test database and leave 500 sales for calibration (or model building database). omplete the following steps. Open the midsize700.sav database and sort the data by RANDOM. Note: the random variable has been previously created in PASW/SPSS to ensure the data is in random order. Select Data Sort ases choose Random from the variable list choose Ascending OK. Save the current database, then use your mouse to highlight the rows 501 to 700. Delete these and save the remainder as midsizemodel.sav using File Save As. aution: remember to use the Save As function rather than Save, to avoid erasing the midsize700 database: This model database will be used to calibrate the model and for initial testing. Reopen the midsize700.sav database and highlight rows 1 to 500, delete these, and save the remaining 200 sales as midsizetest.sav. aution: remember to use the Save As function. This test database will be used for model testing, using sales not used in calibrating the model. Step 7 Model alibration We will now begin to calibrate the model using stepwise regression on the model database. The mechanics of stepwise regression are discussed in hapter 9 of the Advanced omputer Assisted Mass Appraisal text. Our goal is to progressively eliminate any model variable which is not significant in other words, remove variables that do not add to the model's predictive power. Open the midsizemodel database. Select Analyze Regression Linear. Under Method, select Stepwise. Select the final group of variables from Step 6 above: Dependent variable Adj_Price; Independent variables adjlotsz, slab, poolyes, partbsmt, Fireplcs, crawl, Bedrooms, effage, outbldgbin, story15, story20, corner, linfinarea, DEKS, GARAGES. lick Options. Under Stepping Method riteria, Use Probability of F, set Entry to 0.30 and Removal to This sets the entry and removal threshold higher than the PASW/SPSS default limits. This is a less restrictive setting, meaning it allows more variables into the model. The enter value limits the entry of a variable into the model when the Sig. value is greater than Once a variable is in the model, the Sig. value can change as other variables enter -- if a Sig. value of a variable in the model increases beyond the remove value 0.35, then that variable will be removed from the model. lick Statistics. Select Estimates, Model fit and ollinearity diagnostics. Run the regression. We will reproduce only the model summary and final steps for ANOVA and oefficients. 5.16

17 Sales-Based Modeling Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1.797(a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) a Predictors: (onstant), linfinarea b Predictors: (onstant), linfinarea, adjlotsize c Predictors: (onstant), linfinarea, adjlotsize, effage d Predictors: (onstant), linfinarea, adjlotsize, effage, story20 e Predictors: (onstant), linfinarea, adjlotsize, effage, story20, crawl f Predictors: (onstant), linfinarea, adjlotsize, effage, story20, crawl, slab g Predictors: (onstant), linfinarea, adjlotsize, effage, story20, crawl, slab, poolyes h Predictors: (onstant), linfinarea, adjlotsize, effage, story20, crawl, slab, poolyes, Fireplcs i Predictors: (onstant), linfinarea, adjlotsize, effage, story20, crawl, slab, poolyes, Fireplcs, story15 j Predictors: (onstant), linfinarea, adjlotsize, effage, story20, crawl, slab, poolyes, Fireplcs, story15, Bedrooms k Predictors: (onstant), linfinarea, adjlotsize, effage, story20, crawl, slab, poolyes, Fireplcs, story15, Bedrooms, GARAGES l Predictors: (onstant), linfinarea, adjlotsize, effage, story20, crawl, slab, poolyes, Fireplcs, story15, Bedrooms, GARAGES, partbsmt In the Model Summary, notice that the R 2 and adjusted R 2 increase and SEE decreases during the progression of steps. This shows all variables in the model tend to improve the result although the benefit of additional variables drops off significantly at step 9 as bedrooms, garages, and partbsmt are added. ANOVA(m) Model Sum of Squares df Mean Square F Sig. 12 Regression (l) Residual Total l Predictors: (onstant), linfinarea, ADJLOTSZ, effage, story20, crawl, slab, poolyes, Fireplcs, story15, Bedrooms, GARAGES, partbsmt m Dependent Variable: Adj_Price 5.17

18 Lesson No. 5 oefficients(a) Unstandardized oefficients Standardized oefficients ollinearity Statistics Model B Std. Error Beta t Sig. Tolerance VIF (onstant) linfinarea adjlotsize effage story crawl slab poolyes Fireplcs story Bedrooms GARAGES partbsmt a Dependent Variable: Adj_Price Additional details revealed in the oefficients table above are: exclusion of the DEKS and Outbldgbin variables, as expected, but corner was also excluded at this level. As the Beta value for corner is low and the Sig. value is.335, we will not attempt to force this into the model; Sig. values have improved for partbsmt; no variables were above the desired 90% confidence level (Sig. of 0.10 or greater); effage still shows a VIF value greater than the desired or 30% tolerance but the beta value is indicating roughly 26% of the value is explained by this variable, far too high for exclusion; and linfinarea, similar to effage, has a high VIF score and high Beta. In the above test we found one statistic for effage which suggested removal from the model and another which indicated the variable should be retained. This issue is a common problem faced by modelers. It is important to make a conscious decision as to which statistic should be emphasized when the results indicate conflicting conclusions. The final step in model calibration is to identify outliers or sales which have high residual values (the difference between the predicted value and the actual sale price of each record in the sales database). Our threshold for outliers will be any sales with residual values which lie outside ±3 standard deviations from the mean predicted value. Our strategy will be to "prune" or remove these sales from the model. There is no generally accepted threshold for what is considered an outlier and what sales should be eliminated this is a decision of the modeler, depending on the circumstances. For example, with a large database, outliers may have little impact on model predictability and can possibly be ignored. To identify outliers we re-run our model using the variables identified by stepwise regression, but with the Method set to Enter and with the asewise Diagnostics report selected: Select Analyze Regression Linear. hange Method to Enter. Remove Outbldngbin, decks, and corner from the list of dependent variables. Select Statistics asewise Diagnostics and set Outliers outside 3 standard deviations ontinue. 5.18

19 Sales-Based Modeling lick Save and select Standardized under the Residuals heading. ontinue OK. The model summary and other reports will be the same as above since no variables have changed. Our interest is the asewise Diagnostics report, which shows 5 sales with high residual values. asewise Diagnostics a ase Number Std. Residual Adj_Price Predicted Value Residual a. Dependent Variable: Adj_Price The next step is to set a filter to eliminate these sales from the calibration process. Data Select cases. Select If condition is satisfied If... Set filter to ABS[ZRE_1]<3. ontinue OK to re-run the regression. Model Summary b Model R R Square Adjusted R Square Std. Error of the Estimate a a. Predictors: (onstant), garages, slab, poolyes, partbsmt, adjlotsize, story15, Bedrooms, Fireplcs, crawl, story20, effage, linfinarea b. Dependent Variable: Adj_Price ANOVA b Model Sum of Squares df Mean Square F Sig. 1 Regression 2.987E E a Residual 6.696E E9 Total 3.656E a. Predictors: (onstant), garages, slab, poolyes, partbsmt, adjlotsize, story15, Bedrooms, Fireplcs, crawl, story20, effage, linfinarea b. Dependent Variable: Adj_Price 5.19

20 Residuals Statistics a Minimum Maximum Mean Std. Deviation N Lesson No. 5 oefficients a Unstandardized oefficients Standardized oefficients ollinearity Statistics Model B Std. Error Beta t Sig. Tolerance VIF 1 (onstant) adjlotsize slab partbsmt crawl poolyes Fireplcs Bedrooms effage story story linfinarea garages a. Dependent Variable: Adj_Price Predicted Value Residual Std. Predicted Value Std. Residual a. Dependent Variable: Adj_Price The Model Summary shows the R 2 and SEE are improved with the removal of the five outliers. The residual statistics show the standard deviation of the residuals now falls within a range of to 2.913, within our objective of ± 3 standard deviations. This completes the calibration process. In order to test the model, predictive values must be calculated. PASW/SPSS provides a feature to save the values generated by the model, but this only provides values for the sales used in the final calibration. As outliers in the regression may not be outliers in the valuation process, it is normal to apply the model to all sales in the model database. We will calculate values with the transformation below. This will also be useful in applying the model to the other sale databases. OMPUTE AMRAVAL = * poolyes * adjlotsize * slab * partbsmt * garages *story * Bedrooms * Fireplcs * crawl * story * effage * linfinarea. OMPUTE amraasr = amraval / adj_price. Note: you can double check your transformation for AMRAVAL using Descriptive Statistics. The mean of AMRAVAL should be very close to the mean of Adj_Price (the dependent variable). ompletion of these transformations will allow you to proceed with the initial testing of the model which we are now ready to do. Remove the filter: Data Select ases All ases. Save your syntax file as "midsize.sps", as you will need it later in the lesson. 5.20

21 Sales-Based Modeling Step 8 Model Testing We will do our initial testing using the model database, then later on test results in the holdout sample. Our first tests are the overall valuation level and dispersion. Remember to change the filter back to Select All ases. We will use Ratio Statistics with AMRAVAL as the Numerator and Adj_Price as the Denominator. Under Statistics, select mean, median, confidence intervals (95%), minimum, maximum, OD, and uncheck any others. Ratio Statistics for AMRAVAL / Adj_Price Mean % onfidence Interval for Lower Bound.998 Mean Upper Bound Median % onfidence Interval for Median Lower Bound.988 Upper Bound Actual overage 95.6% Minimum.704 Maximum oefficient of Dispersion.064 The confidence interval for the median is constructed without any distribution assumptions. The actual coverage level may be greater than the specified level. Other confidence intervals are constructed by assuming a Normal distribution for the ratios. The Ratio Statistics show the mean and median are very near 1.00 and the confidence intervals for both statistics include the target of The OD is within the IAAO standard of 5 to 10% for homogeneous areas. While this outcome is acceptable, we will continue testing the impact of individual variables not included in the model. Neighbourhood Adjustments As indicated earlier, Nbhd was not part of the model process, so it will be necessary to test to ensure all neighbourhoods are valued at the same level. In Lesson 2, we learned that the Kruskal-Wallis (K-W) test will tell us whether the property groups, in this case neighbourhoods, have the same level of assessment. We are looking for two outcomes of this test to confirm that the level of assessment is similar in all neighbourhoods: 1. The expected value of the mean ranks for each neighbourhood should be in the centre of the distribution. The test assigns a rank to each observation in order from least values to greatest values. The sum of the ranks is calculated for each group and the mean of the ranks determined. The mean ranks for each neighbourhood should be approximately equal to half the number of sales, or in this case, The chi-square statistic can be used to approximate the significance of the K-W test, given the confidence level and degrees of freedom. In this case, our target is a 95% confidence level and the degrees of freedom statistic is 2. The Asymp. Sig. calculates the probability associated with the calculated chi-square statistic. If the Sig. is above the 5% threshold, we can accept the null hypothesis that the level of assessment is similar for all neighbourhoods. To run a K-W test, complete the following PASW/SPSS commands: Analyze Nonparametric tests K Independent Samples. Select AMRAASR as the Test Variable and NBHD as the Grouping Variable. lick Define Range and enter 36 and 46 as the Minimum and Maximum. ontinue Ensure Kruskal-Wallis H is checked OK to run. 5.21

STATISTICS PART Instructor: Dr. Samir Safi Name:

STATISTICS PART Instructor: Dr. Samir Safi Name: STATISTICS PART Instructor: Dr. Samir Safi Name: ID Number: Question #1: (20 Points) For each of the situations described below, state the sample(s) type the statistical technique that you believe is the

More information

Chapter Six- Selecting the Best Innovation Model by Using Multiple Regression

Chapter Six- Selecting the Best Innovation Model by Using Multiple Regression Chapter Six- Selecting the Best Innovation Model by Using Multiple Regression 6.1 Introduction In the previous chapter, the detailed results of FA were presented and discussed. As a result, fourteen factors

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Univariate Statistics Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved Table of Contents PAGE Creating a Data File...3 1. Creating

More information

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pages 37-64. The description of the problem can be found

More information

Supply Chain Capital Flows Management of Fortune 500 Manufacturing and Retail Companies: A Comparison

Supply Chain Capital Flows Management of Fortune 500 Manufacturing and Retail Companies: A Comparison Supply Chain Capital Flows Management of Fortune 500 Manufacturing and Retail Companies: A Comparison Edward Chu* California State University, Dominguez Hills A company s trade and inventory policies determine

More information

CHAPTER 8 T Tests. A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test

CHAPTER 8 T Tests. A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test CHAPTER 8 T Tests A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test 8.1. One-Sample T Test The One-Sample T Test procedure: Tests

More information

Summary of Wind Turbine Analysis Robert J. Gloudemans December 4, 2013

Summary of Wind Turbine Analysis Robert J. Gloudemans December 4, 2013 ALMY, GLOUDEMANS, JACOBS & DENNE Property Taxation and Assessment Consultants Appendix A 7630 NORTH 10 TH AVENUE PHOENIX, ARIZONA 85021 U.S.A. 1-602-870-9368 FAX: 1-602-861-2114 http://www.agjd.com Summary

More information

SECTION 11 ACUTE TOXICITY DATA ANALYSIS

SECTION 11 ACUTE TOXICITY DATA ANALYSIS SECTION 11 ACUTE TOXICITY DATA ANALYSIS 11.1 INTRODUCTION 11.1.1 The objective of acute toxicity tests with effluents and receiving waters is to identify discharges of toxic effluents in acutely toxic

More information

SPSS 14: quick guide

SPSS 14: quick guide SPSS 14: quick guide Edition 2, November 2007 If you would like this document in an alternative format please ask staff for help. On request we can provide documents with a different size and style of

More information

Background for Case Study: Clifton Park Residential Real Estate

Background for Case Study: Clifton Park Residential Real Estate Techniques for Engaging Business Students in the Statistics Classroom Jane E. Oppenlander Example Assignments and Class Exercises Background for Case Study: Clifton Park Residential Real Estate Data on

More information

CHAPTER 5 RESULTS AND ANALYSIS

CHAPTER 5 RESULTS AND ANALYSIS CHAPTER 5 RESULTS AND ANALYSIS This chapter exhibits an extensive data analysis and the results of the statistical testing. Data analysis is done using factor analysis, regression analysis, reliability

More information

Timing Production Runs

Timing Production Runs Class 7 Categorical Factors with Two or More Levels 189 Timing Production Runs ProdTime.jmp An analysis has shown that the time required in minutes to complete a production run increases with the number

More information

Distinguish between different types of numerical data and different data collection processes.

Distinguish between different types of numerical data and different data collection processes. Level: Diploma in Business Learning Outcomes 1.1 1.3 Distinguish between different types of numerical data and different data collection processes. Introduce the course by defining statistics and explaining

More information

Categorical Predictors, Building Regression Models

Categorical Predictors, Building Regression Models Fall Semester, 2001 Statistics 621 Lecture 9 Robert Stine 1 Categorical Predictors, Building Regression Models Preliminaries Supplemental notes on main Stat 621 web page Steps in building a regression

More information

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney Name: Intro to Statistics for the Social Sciences Lab Session: Spring, 2015, Dr. Suzanne Delaney CID Number: _ Homework #22 You have been hired as a statistical consultant by Donald who is a used car dealer

More information

Gasoline Consumption Analysis

Gasoline Consumption Analysis Gasoline Consumption Analysis One of the most basic topics in economics is the supply/demand curve. Simply put, the supply offered for sale of a commodity is directly related to its price, while the demand

More information

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING INTRODUCTION JMP software provides introductory statistics in a package designed to let students visually explore data in an interactive way with

More information

Opening SPSS 6/18/2013. Lesson: Quantitative Data Analysis part -I. The Four Windows: Data Editor. The Four Windows: Output Viewer

Opening SPSS 6/18/2013. Lesson: Quantitative Data Analysis part -I. The Four Windows: Data Editor. The Four Windows: Output Viewer Lesson: Quantitative Data Analysis part -I Research Methodology - COMC/CMOE/ COMT 41543 The Four Windows: Data Editor Data Editor Spreadsheet-like system for defining, entering, editing, and displaying

More information

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data Intro to Statistics for the Social Sciences Fall, 2017, Dr. Suzanne Delaney Extra Credit Assignment Instructions: You have been hired as a statistical consultant by Donald who is a used car dealer to help

More information

CHAPTER 4. STATUS OF E-BUSINESS APPLICATION SYSTEM AND ENABLERS IN SCM OF MSMEs

CHAPTER 4. STATUS OF E-BUSINESS APPLICATION SYSTEM AND ENABLERS IN SCM OF MSMEs 70 CHAPTER 4 STATUS OF E-BUSINESS APPLICATION SYSTEM AND ENABLERS IN SCM OF MSMEs 4.1 PREAMBLE This chapter deals with analysis of data gathered through questionnaire survey to bring out The profile of

More information

ANALYSING QUANTITATIVE DATA

ANALYSING QUANTITATIVE DATA 9 ANALYSING QUANTITATIVE DATA Although, of course, there are other software packages that can be used for quantitative data analysis, including Microsoft Excel, SPSS is perhaps the one most commonly subscribed

More information

13-5 The Kruskal-Wallis Test

13-5 The Kruskal-Wallis Test 13-5 The Kruskal-Wallis Test luman, hapter 13 1 1 13-5 The Kruskal-Wallis Test The NOV uses the F test to compare the means of three or more populations. The assumptions for the NOV test are that the populations

More information

Correlations. Regression. Page 1. Correlations SQUAREFO BEDROOMS BATHS ASKINGPR

Correlations. Regression. Page 1. Correlations SQUAREFO BEDROOMS BATHS ASKINGPR multreg.sav squarefo bedrooms baths askingpr 3632 4 2.5 49 2 4889 6 5.0 399 3 3000 5 3.5 395 4 3669 4 3.5 379 5 2800 4 3.0 359 6 3600 5 3.5 349 7 2800 5 2.5 320 8 2257 3 3.0 299 9 2000 3 3.0 295 0 2455

More information

Revision confidence limits for recent data on trend levels, trend growth rates and seasonally adjusted levels

Revision confidence limits for recent data on trend levels, trend growth rates and seasonally adjusted levels W O R K I N G P A P E R S A N D S T U D I E S ISSN 1725-4825 Revision confidence limits for recent data on trend levels, trend growth rates and seasonally adjusted levels Conference on seasonality, seasonal

More information

1-Sample t Confidence Intervals for Means

1-Sample t Confidence Intervals for Means 1-Sample t Confidence Intervals for Means Requirements for complete responses to free response questions that require 1-sample t confidence intervals for means: 1. Identify the population parameter of

More information

5 CHAPTER: DATA COLLECTION AND ANALYSIS

5 CHAPTER: DATA COLLECTION AND ANALYSIS 5 CHAPTER: DATA COLLECTION AND ANALYSIS 5.1 INTRODUCTION This chapter will have a discussion on the data collection for this study and detail analysis of the collected data from the sample out of target

More information

SPSS Guide Page 1 of 13

SPSS Guide Page 1 of 13 SPSS Guide Page 1 of 13 A Guide to SPSS for Public Affairs Students This is intended as a handy how-to guide for most of what you might want to do in SPSS. First, here is what a typical data set might

More information

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved.

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved. AcaStat How To Guide AcaStat Software Copyright 2016, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Frequencies... 3 List Variables... 4 Descriptives... 5 Explore Means...

More information

Two Way ANOVA. Turkheimer PSYC 771. Page 1 Two-Way ANOVA

Two Way ANOVA. Turkheimer PSYC 771. Page 1 Two-Way ANOVA Page 1 Two Way ANOVA Two way ANOVA is conceptually like multiple regression, in that we are trying to simulateously assess the effects of more than one X variable on Y. But just as in One Way ANOVA, the

More information

Multiple Regressions for the Financial Analysis of Alabanian Economy

Multiple Regressions for the Financial Analysis of Alabanian Economy Multiple Regressions for the Financial Analysis of Alabanian Economy Assoc.Prof.Dr. Bederiana Shyti Head of Department of Mathematics, Faculty of Natural Sciences, University Aleksander Xhuvani Elbasan,

More information

THE TRUE PRICE FOR YOUR HOUSE METTILDA BENEDICT KAIMATHURUTH OKLAHOMA STATE UNIVERSITY

THE TRUE PRICE FOR YOUR HOUSE METTILDA BENEDICT KAIMATHURUTH OKLAHOMA STATE UNIVERSITY THE TRUE PRICE FOR YOUR HOUSE METTILDA BENEDICT KAIMATHURUTH OKLAHOMA STATE UNIVERSITY Table of Contents ABSTRACT... 2 INTRODUCTION... 2 METHODOLOGY... 3 RESULTS... 4 LIMITATIONS... 6 CONCLUSION... 6 CITATIONS

More information

CHAPTER FIVE CROSSTABS PROCEDURE

CHAPTER FIVE CROSSTABS PROCEDURE CHAPTER FIVE CROSSTABS PROCEDURE 5.0 Introduction This chapter focuses on how to compare groups when the outcome is categorical (nominal or ordinal) by using SPSS. The aim of the series of exercise is

More information

Midterm Exam. Friday the 29th of October, 2010

Midterm Exam. Friday the 29th of October, 2010 Midterm Exam Friday the 29th of October, 2010 Name: General Comments: This exam is closed book. However, you may use two pages, front and back, of notes and formulas. Write your answers on the exam sheets.

More information

Getting Started with OptQuest

Getting Started with OptQuest Getting Started with OptQuest What OptQuest does Futura Apartments model example Portfolio Allocation model example Defining decision variables in Crystal Ball Running OptQuest Specifying decision variable

More information

The Kruskal-Wallis Test with Excel In 3 Simple Steps. Kilem L. Gwet, Ph.D.

The Kruskal-Wallis Test with Excel In 3 Simple Steps. Kilem L. Gwet, Ph.D. The Kruskal-Wallis Test with Excel 2007 In 3 Simple Steps Kilem L. Gwet, Ph.D. Copyright c 2011 by Kilem Li Gwet, Ph.D. All rights reserved. Published by Advanced Analytics, LLC A single copy of this document

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

Untangling Correlated Predictors with Principle Components

Untangling Correlated Predictors with Principle Components Untangling Correlated Predictors with Principle Components David R. Roberts, Marriott International, Potomac MD Introduction: Often when building a mathematical model, one can encounter predictor variables

More information

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization Business Intelligence Analytics and Data Science A Managerial Perspective 4th Edition Sharda TEST BANK Full download at: https://testbankreal.com/download/business-intelligence-analytics-datascience-managerial-perspective-4th-edition-sharda-test-bank/

More information

More Multiple Regression

More Multiple Regression More Multiple Regression Model Building The main difference between multiple and simple regression is that, now that we have so many predictors to deal with, the concept of "model building" must be considered

More information

Business Statistics (BK/IBA) Tutorial 4 Exercises

Business Statistics (BK/IBA) Tutorial 4 Exercises Business Statistics (BK/IBA) Tutorial 4 Exercises Instruction In a tutorial session of 2 hours, we will obviously not be able to discuss all questions. Therefore, the following procedure applies: we expect

More information

BUS105 Statistics. Tutor Marked Assignment. Total Marks: 45; Weightage: 15%

BUS105 Statistics. Tutor Marked Assignment. Total Marks: 45; Weightage: 15% BUS105 Statistics Tutor Marked Assignment Total Marks: 45; Weightage: 15% Objectives a) Reinforcing your learning, at home and in class b) Identifying the topics that you have problems with so that your

More information

How to Use Excel for Regression Analysis MtRoyal Version 2016RevA *

How to Use Excel for Regression Analysis MtRoyal Version 2016RevA * OpenStax-CNX module: m63578 1 How to Use Excel for Regression Analysis MtRoyal Version 2016RevA * Lyryx Learning Based on How to Use Excel for Regression Analysis BSTA 200 Humber College Version 2016RevA

More information

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization 1) One of SiriusXM's challenges was tracking potential customers

More information

Multiple Regression. Dr. Tom Pierce Department of Psychology Radford University

Multiple Regression. Dr. Tom Pierce Department of Psychology Radford University Multiple Regression Dr. Tom Pierce Department of Psychology Radford University In the previous chapter we talked about regression as a technique for using a person s score on one variable to make a best

More information

Correlation and Simple. Linear Regression. Scenario. Defining Correlation

Correlation and Simple. Linear Regression. Scenario. Defining Correlation Linear Regression Scenario Let s imagine that we work in a real estate business and we re attempting to understand whether there s any association between the square footage of a house and it s final selling

More information

Statistical Observations on Mass Appraisal. by Josh Myers Josh Myers Valuation Solutions, LLC.

Statistical Observations on Mass Appraisal. by Josh Myers Josh Myers Valuation Solutions, LLC. Statistical Observations on Mass Appraisal by Josh Myers Josh Myers Valuation Solutions, LLC. About Josh Josh Myers is an independent CAMA consultant and owner of Josh Myers Valuation Solutions, LLC. Josh

More information

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology As noted previously, Hierarchical Linear Modeling (HLM) can be considered a particular instance

More information

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS Anirvan Banerji New York 24th CIRET Conference Wellington, New Zealand March 17-20, 1999 Geoffrey H. Moore,

More information

STUDY REGARDING THE IMPACT OF THE AUDIT COMMITTEE CHARACTERISTICS ON COMPANY PERFORMANCE

STUDY REGARDING THE IMPACT OF THE AUDIT COMMITTEE CHARACTERISTICS ON COMPANY PERFORMANCE STUDY REGARDING THE IMPACT OF THE AUDIT COMMITTEE CHARACTERISTICS ON COMPANY PERFORMANCE ANGHEL Ioana Valahia University of Târgoviște, Romania MAN Mariana University of Petroșani, Romania Abstract: Regardless

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Biometry 755 Spring 2009 Regression diagnostics p. 1/48 Introduction Every statistical method is developed based on assumptions. The validity of results derived from a given method

More information

SCENARIO: We are interested in studying the relationship between the amount of corruption in a country and the quality of their economy.

SCENARIO: We are interested in studying the relationship between the amount of corruption in a country and the quality of their economy. Introduction to SPSS Center for Teaching, Research and Learning Research Support Group American University, Washington, D.C. Hurst Hall 203 rsg@american.edu (202) 885-3862 This workshop is designed to

More information

Using Excel s Analysis ToolPak Add-In

Using Excel s Analysis ToolPak Add-In Using Excel s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that we can perform different quantitative analysis, including statistical analysis, in Excel. It is powerful,

More information

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT STAT 512 EXAM I STAT 512 Name (7 pts) Problem Points Score 1 40 2 25 3 28 USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT WRITE LEGIBLY. ANYTHING UNREADABLE WILL NOT BE GRADED GOOD LUCK!!!!

More information

Empirical Exercise Handout

Empirical Exercise Handout Empirical Exercise Handout Ec-970 International Corporate Governance Harvard University March 2, 2004 Due Date The new due date for empirical paper is Wednesday, March 24 at the beginning of class. Description

More information

Course on Data Analysis and Interpretation P Presented by B. Unmar. Sponsored by GGSU PART 1

Course on Data Analysis and Interpretation P Presented by B. Unmar. Sponsored by GGSU PART 1 Course on Data Analysis and Interpretation P Presented by B. Unmar Sponsored by GGSU PART 1 1 Data Collection Methods Data collection is an important aspect of any type of research study. Inaccurate data

More information

Flowchart of K-Means Cluster Analysis and Regression Analysis

Flowchart of K-Means Cluster Analysis and Regression Analysis Flowchart of K-Means Cluster Analysis and Regression Analysis Select Clusters and Variables The objective of this project is to identify factors that may cause differences in total profits between two

More information

Session 7. Introduction to important statistical techniques for competitiveness analysis example and interpretations

Session 7. Introduction to important statistical techniques for competitiveness analysis example and interpretations ARTNeT Greater Mekong Sub-region (GMS) initiative Session 7 Introduction to important statistical techniques for competitiveness analysis example and interpretations ARTNeT Consultant Witada Anukoonwattaka,

More information

LECTURE 17: MULTIVARIABLE REGRESSIONS I

LECTURE 17: MULTIVARIABLE REGRESSIONS I David Youngberg BSAD 210 Montgomery College LECTURE 17: MULTIVARIABLE REGRESSIONS I I. What Determines a House s Price? a. Open Data Set 6 to help us answer this question. You ll see pricing data for homes

More information

Regression analysis of profit per 1 kg milk produced in selected dairy cattle farms

Regression analysis of profit per 1 kg milk produced in selected dairy cattle farms ISSN: 2319-7706 Volume 4 Number 2 (2015) pp. 713-719 http://www.ijcmas.com Original Research Article Regression analysis of profit per 1 kg milk produced in selected dairy cattle farms K. Stankov¹*, St.

More information

Development of Modified Evaluation and Prioritization of Risk Priority Number in FMEA

Development of Modified Evaluation and Prioritization of Risk Priority Number in FMEA Development of Modified Evaluation and Prioritization of Risk Priority Number in FMEA N. Sellappan sanov18@yahoo.com Faculty of Engineering/Mechanical and Industrial Engineering Section Salalah College

More information

QUESTION 2 What conclusion is most correct about the Experimental Design shown here with the response in the far right column?

QUESTION 2 What conclusion is most correct about the Experimental Design shown here with the response in the far right column? QUESTION 1 When a Belt Poka-Yoke's a defect out of the process entirely then she should track the activity with a robust SPC system on the characteristic of interest in the defect as an early warning system.

More information

Business Quantitative Analysis [QU1] Examination Blueprint

Business Quantitative Analysis [QU1] Examination Blueprint Business Quantitative Analysis [QU1] Examination Blueprint 2014-2015 Purpose The Business Quantitative Analysis [QU1] examination has been constructed using an examination blueprint. The blueprint, also

More information

Author please check for any updations

Author please check for any updations The Relationship Between Service Quality and Customer Satisfaction: An Empirical Study of the Indian Banking Industry Sunayna Khurana* In today s intense competitive business world, the customer is educated

More information

AP Statistics Scope & Sequence

AP Statistics Scope & Sequence AP Statistics Scope & Sequence Grading Period Unit Title Learning Targets Throughout the School Year First Grading Period *Apply mathematics to problems in everyday life *Use a problem-solving model that

More information

How to Get More Value from Your Survey Data

How to Get More Value from Your Survey Data Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................3

More information

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening r's age when 1st child born 2 4 6 Density.2.4.6.8 Density.5.1 Sociology 774: Regression Models for Categorical Data Instructor: Natasha Sarkisian Preliminary Data Screening A. Examining Univariate Normality

More information

Linear Regression Analysis of Gross Output Value of Farming, Forestry, Animal Husbandry and Fishery Industries

Linear Regression Analysis of Gross Output Value of Farming, Forestry, Animal Husbandry and Fishery Industries 1106 Proceedings of the 8th International Conference on Innovation & Management Linear Regression Analysis of Gross Output Value of Farming, Forestry, Animal Husbandry and Fishery Industries Liu Haime,

More information

Discussion Solution Mollusks and Litter Decomposition

Discussion Solution Mollusks and Litter Decomposition Discussion Solution Mollusks and Litter Decomposition. Is the rate of litter decomposition affected by the presence of mollusks? 2. Does the effect of mollusks on litter decomposition differ among the

More information

A Production Problem

A Production Problem Session #2 Page 1 A Production Problem Weekly supply of raw materials: Large Bricks Small Bricks Products: Table Profit = $20/Table Chair Profit = $15/Chair Session #2 Page 2 Linear Programming Linear

More information

Suggested Statistical Methods to Analyze Air Power Operations Course Surveys

Suggested Statistical Methods to Analyze Air Power Operations Course Surveys 2017-10-04 DRDC-RDDC-2017-L316 Prepared for: CO CFAWC Scientific Letter Suggested Statistical Methods to Analyze Air Power Operations Course Surveys Background The RCAF has conducted two serials of the

More information

APPLICATION OF SEASONAL ADJUSTMENT FACTORS TO SUBSEQUENT YEAR DATA. Corresponding Author

APPLICATION OF SEASONAL ADJUSTMENT FACTORS TO SUBSEQUENT YEAR DATA. Corresponding Author 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 APPLICATION OF SEASONAL ADJUSTMENT FACTORS TO SUBSEQUENT

More information

Chapter 5 Notes Page 1

Chapter 5 Notes Page 1 Chapter 5 Notes Page 1 COST BEHAVIOR When dealing with costs, it helps for you to determine what drives the cost in question. A Cost Driver (also called Cost Base) is an activity that is associated with,

More information

STAT/MATH Chapter3. Statistical Methods in Practice. Averages and Variation 1/27/2017. Measures of Central Tendency: Mode, Median, and Mean

STAT/MATH Chapter3. Statistical Methods in Practice. Averages and Variation 1/27/2017. Measures of Central Tendency: Mode, Median, and Mean STAT/MATH 3379 Statistical Methods in Practice Dr. Ananda Manage Associate Professor of Statistics Department of Mathematics & Statistics SHSU 1 Chapter3 Averages and Variation Copyright Cengage Learning.

More information

Statistics: Data Analysis and Presentation. Fr Clinic II

Statistics: Data Analysis and Presentation. Fr Clinic II Statistics: Data Analysis and Presentation Fr Clinic II Overview Tables and Graphs Populations and Samples Mean, Median, and Standard Deviation Standard Error & 95% Confidence Interval (CI) Error Bars

More information

Basic Statistics, Sampling Error, and Confidence Intervals

Basic Statistics, Sampling Error, and Confidence Intervals 02-Warner-45165.qxd 8/13/2007 5:00 PM Page 41 CHAPTER 2 Introduction to SPSS Basic Statistics, Sampling Error, and Confidence Intervals 2.1 Introduction We will begin by examining the distribution of scores

More information

METHOD VALIDATION TECHNIQUES PREPARED FOR ENAO ASSESSOR CALIBRATION COURSE OCTOBER/NOVEMBER 2012

METHOD VALIDATION TECHNIQUES PREPARED FOR ENAO ASSESSOR CALIBRATION COURSE OCTOBER/NOVEMBER 2012 METHOD VALIDATION PREPARED FOR ENAO ASSESSOR CALIBRATION COURSE TECHNIQUES OCTOBER/NOVEMBER 2012 Prepared by for ENAO Assessor Calibration B SCOPE Introduction House Rules Central Tendency Statistics Population

More information

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1 Soc 73994, Homework #2: Basics of Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 All answers should be typed and mailed to

More information

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models. Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable

More information

Spreadsheets in Education (ejsie)

Spreadsheets in Education (ejsie) Spreadsheets in Education (ejsie) Volume 2, Issue 2 2005 Article 5 Forecasting with Excel: Suggestions for Managers Scott Nadler John F. Kros East Carolina University, nadlers@mail.ecu.edu East Carolina

More information

MBF1413 Quantitative Methods

MBF1413 Quantitative Methods MBF1413 Quantitative Methods Prepared by Dr Khairul Anuar 1: Introduction to Quantitative Methods www.notes638.wordpress.com Assessment Two assignments Assignment 1 -individual 30% Assignment 2 -individual

More information

Harbingers of Failure: Online Appendix

Harbingers of Failure: Online Appendix Harbingers of Failure: Online Appendix Eric Anderson Northwestern University Kellogg School of Management Song Lin MIT Sloan School of Management Duncan Simester MIT Sloan School of Management Catherine

More information

Clovis Community College Class Assessment

Clovis Community College Class Assessment Class: Math 110 College Algebra NMCCN: MATH 1113 Faculty: Hadea Hummeid 1. Students will graph functions: a. Sketch graphs of linear, higherhigher order polynomial, rational, absolute value, exponential,

More information

KNOWLEDGE MANAGEMENT INITIATIVES IN EDUCATION

KNOWLEDGE MANAGEMENT INITIATIVES IN EDUCATION KNOWLEDGE MANAGEMENT INITIATIVES IN EDUCATION Principal, College of Computer Sciences, Wakad Pune 57 (MS) INDIA Information practices and learning strategies known as Knowledge management are gaining importance

More information

A Scoring System for Sales Tax Audit Selection

A Scoring System for Sales Tax Audit Selection A Scoring System for Sales Tax Audit Selection Thomas J. Winn, Jr. Audit Division Headquarters, Office of the State Comptroller of Public Accounts Austin, Texas ntroduction & Overview Tax: audits are investigative

More information

4.3 Nonparametric Tests cont...

4.3 Nonparametric Tests cont... Class #14 Wednesday 2 March 2011 What did we cover last time? Hypothesis Testing Types Student s t-test - practical equations Effective degrees of freedom Parametric Tests Chi squared test Kolmogorov-Smirnov

More information

ISO 13528:2015 Statistical methods for use in proficiency testing by interlaboratory comparison

ISO 13528:2015 Statistical methods for use in proficiency testing by interlaboratory comparison ISO 13528:2015 Statistical methods for use in proficiency testing by interlaboratory comparison ema training workshop August 8-9, 2016 Mexico City Class Schedule Monday, 8 August Types of PT of interest

More information

ROADMAP. Introduction to MARSSIM. The Goal of the Roadmap

ROADMAP. Introduction to MARSSIM. The Goal of the Roadmap ROADMAP Introduction to MARSSIM The Multi-Agency Radiation Survey and Site Investigation Manual (MARSSIM) provides detailed guidance for planning, implementing, and evaluating environmental and facility

More information

Using Mapmaker/QTL for QTL mapping

Using Mapmaker/QTL for QTL mapping Using Mapmaker/QTL for QTL mapping M. Maheswaran Tamil Nadu Agriculture University, Coimbatore Mapmaker/QTL overview A number of methods have been developed to map genes controlling quantitatively measured

More information

STAT 2300: Unit 1 Learning Objectives Spring 2019

STAT 2300: Unit 1 Learning Objectives Spring 2019 STAT 2300: Unit 1 Learning Objectives Spring 2019 Unit tests are written to evaluate student comprehension, acquisition, and synthesis of these skills. The problems listed as Assigned MyStatLab Problems

More information

Statistics, Data Analysis, and Decision Modeling

Statistics, Data Analysis, and Decision Modeling - ' 'li* Statistics, Data Analysis, and Decision Modeling T H I R D E D I T I O N James R. Evans University of Cincinnati PEARSON Prentice Hall Upper Saddle River, New Jersey 07458 CONTENTS Preface xv

More information

SUCCESSFUL ENTREPRENEUR: A DISCRIMINANT ANALYSIS

SUCCESSFUL ENTREPRENEUR: A DISCRIMINANT ANALYSIS SUCCESSFUL ENTREPRENEUR: A DISCRIMINANT ANALYSIS M. B. M. Ismail Department of Management, Faculty of Management and Commerce, South Eastern University of Sri Lanka, Oluvil mbmismail@seu.ac.lk ABSTRACT:

More information

Categorical Variables, Part 2

Categorical Variables, Part 2 Spring, 000 - - Categorical Variables, Part Project Analysis for Today First multiple regression Interpreting categorical predictors and their interactions in the first multiple regression model fit in

More information

Excel #2: No magic numbers

Excel #2: No magic numbers Excel #2: No magic numbers This lesson comes from programmers who long ago learned that everything entered into code must be defined and documented. Placing numbers into an equation is dangerous because

More information

A Research Note on Correlation

A Research Note on Correlation A Research ote on Correlation Instructor: Jagdish Agrawal, CSU East Bay 1. When to use Correlation: Research questions assessing the relationship between pairs of variables that are measured at least at

More information

Mining for Gold gets easier and a lot more fun! By Ken Deal

Mining for Gold gets easier and a lot more fun! By Ken Deal Mining for Gold gets easier and a lot more fun! By Ken Deal Marketing researchers develop and use scales routinely. It seems to be a fairly common procedure when analyzing survey data to assume that a

More information

Attachment 1. Categorical Summary of BMP Performance Data for Solids (TSS, TDS, and Turbidity) Contained in the International Stormwater BMP Database

Attachment 1. Categorical Summary of BMP Performance Data for Solids (TSS, TDS, and Turbidity) Contained in the International Stormwater BMP Database Attachment 1 Categorical Summary of BMP Performance Data for Solids (TSS, TDS, and Turbidity) Contained in the International Stormwater BMP Database Prepared by Geosyntec Consultants, Inc. Wright Water

More information

FACTORS AFFECTING JOB STRESS AMONG IT PROFESSIONALS IN APPAREL INDUSTRY: A CASE STUDY IN SRI LANKA

FACTORS AFFECTING JOB STRESS AMONG IT PROFESSIONALS IN APPAREL INDUSTRY: A CASE STUDY IN SRI LANKA FACTORS AFFECTING JOB STRESS AMONG IT PROFESSIONALS IN APPAREL INDUSTRY: A CASE STUDY IN SRI LANKA W.N. Arsakularathna and S.S.N. Perera Research & Development Centre for Mathematical Modeling, Faculty

More information

Soci Statistics for Sociologists

Soci Statistics for Sociologists University of North Carolina Chapel Hill Soci708-001 Statistics for Sociologists Fall 2009 Professor François Nielsen Stata Commands for Module 11 Multiple Regression For further information on any command

More information

Computing Descriptive Statistics Argosy University

Computing Descriptive Statistics Argosy University 2014 Argosy University 2 Computing Descriptive Statistics: Ever Wonder What Secrets They Hold? The Mean, Mode, Median, Variability, and Standard Deviation Introduction Before gaining an appreciation for

More information