Homework I: Stata Guide

Size: px
Start display at page:

Download "Homework I: Stata Guide"

Transcription

1 Econ 120B Stata Guide Hw1 Claudio Labanca Love Lofstrom 1 Homework I: Stata Guide This will serve as a guide for you to learn Stata. A program used to process data for statistical inference. These instructions will aid you in completing your first homework assignment. If anything, really anything, is unclear four of your best resources will be: I. Office Hours (found on TED). II. Use the help command in Stata; help x or google help x stata. Replace x with the command that you are unsure of. III. statalab.ucsd@gmail.com IV. a great self help guide from UCLA. Commands will be in bold (type the phrase in bold then hit enter). describe will show the variables contained in the dataset. Stata is extremely case sensitive. If you enter a command and the variable cannot be found; it is possible that you entered happins, not Happins. Clicking will be in italics. Title will be what you click, -> indicates what you click next. E.g, File-> open-> documents-> school-> Stata Homework -> dataset.dta. I. Logistics A) If you are using your own computer then this first step may be redundant. Once you open Stata, clear will remove all previous variables in the program. This will ensure that the only variables in Stata are related to the homework assignment. B) set more off will make the analysis run faster. However, if you have a fast computer this may not be necessary. C) set mem 15 (only if you run Stata 11 or earlier, which is unlikely if you use it through VCL or a UCSD computer). D) cap log close this will close the existing log file. A log file is what records what is done in Stata. E) Choose Working Directory: File -> Change Working Directory -> select a folder F) Create a log file in which the results of the programming will be saved. E.g: File -> log -> begin -> selected the folder where you want to save it -> pick a name -> Save it

2 Econ 120B Stata Guide Hw1 Claudio Labanca Love Lofstrom 2 G) Open the dataset (dta file). File -> Open -> Find and select the file country_happiness.dta H) Save your data as a new file. This will make sure that you do not tamper with the original file. File -> Save As -> selected the folder where you want to save it -> pick a name -> Save IA. Analysis: Happiness A) describe allows you to see what variables are contained in the dataset. The dataset contains information about socioeconomic, and happiness scores for 75 countries. describe happins gdp2002 (the two variables that we are interested in for this homework assignment). B) summarize will give you summary statistics on the variables that you enter. It will give you: number of observations, mean, Std. Dev., min/max values. summarize happins summarize gdp2002 C) sort will re-arrange the variable in ascending order. This will allow us to see which countries are the happiest/saddest sort happins browse will show you the data in cell-format (like excel). Enter the command to see for yourself that the variables are re-arranged D) We want to find the least/most happy country in the dataset. In order to do so, we will use list. * _N is the total number of observations. * _n is the observation/ row number. E.g, _n==5 is the fifth unhappiest country in the dataset. i) To find the unhappiest country: list country_name happins if _n==1 ii) To find the happiest country: list country_name happins if _n==_n * List can be used to find the happiness index of particular countries. We want to see how happy people are in USA and Italy: iii) list cty happins if country_name == United States iv) list cty happins of country_name == Italy

3 Econ 120B Stata Guide Hw1 Claudio Labanca Love Lofstrom 3 E) We can use count to see how many countries that are happier than a specific country. Let s see how many countries that are happier than the U.S. by using the happiness index for the U.S. It is also possible to see which those countries are, and values in between two countries: Portugal and USA. i) count if happins > ii) list country_name if happins > iii) a. list cty happins if country_name == Portugal b. List if happins > & happins < IB. Analysis: Religion A. Religion is a string variable, non-numerical. Summarize won t work for this. Instead we will use the tabulate command. It gives us frequencies, percentage, and cumulative distribution for each religion type. * describe religion, see for yourself * tabulate religion B. We can look at different countries to see what religion has a majority in a particular country. For example let s see in which countries Shiites are in majority. It is also possible to see which countries that don t practice certain religions. In order to do so we use!=, does not equal command. Don t forget quotation marks for string variables! i) list country_name happins religion if religion == Shia Islam ii) list country_name happins if religion!= Catholic Heavily C. Once again we want to look at the summary statistics for happiness scores and GDP/capita. * summarize happins gdp2002 D. As you saw previously, it is extremely easy to find the standard deviation, mean, etc. Let s test your understanding of statistics by finding Std. Dev. manually in Stata. This will be done in a few steps. i) We need to create a variable for the deviation. We will subtract the mean from each observation of happins. generate happins_deviation = happins (we got the mean by using summarize happins).

4 Econ 120B Stata Guide Hw1 Claudio Labanca Love Lofstrom 4 ii) The deviation must be squared. generate happins_deviation_sq = happins_deviation^2 iii) Now it is time to add up all of the squared deviations. tabstat allows us to produce a table of statistics. tabstat happins_deviation_sq, statistics (sum) iv) In order to do calculations in Stata we use display. display 1+1, display 5*5, Display 1-1, Display 5/0 (j/k you can t divide by 0). In order to get the sample variance we will divide the squared deviation by N-1. display /74. Alternatively, display /(_N-1). v) In order to get the Standard Deviation we need to take the square root of the sample variance. display sqrt( ). E. Now try to calculate the standard deviation for the GDP variable. i) generate gdp_deviation_sq = (gdp )^2 ii) tabstat gdp_deviation_sq, statistics (sum) columns(variables) iii) display sqrt(1.05e+10/74) iv) The value won t be exactly the same as the one shown by using summarize. This is due to rounding. F. It is possible to plot the distribution using Stata graphical tools. We are to plot a normal distribution that has the same mean and standard deviation as happins. histogram happins, frequency normal G. Let s plot a histogram for GDP as well. histogram gdp2002, normal H. We can look at the correlation between GDP and Happiness in two ways. i) corr happins gdp2002, which gives us the correlation between happiness and GDP. ii) scatter happins gdp2002, which will graph a scatter plot of their relationship. iii)save the graph. In the graph window File -> Save As -> Save as type: Portable document format (*.pdf) -> select the folder where you want to save it -> pick a name -> Save File

5 Econ 120B 5 Stata Guide Hw1 Claudio Labanca Love Lofstrom I. Let s figure out what country is the one with a GDP/capita closets to $60,000. This could be hard doing by eye. Fortunately, we can add labels to the scatter plot. i) scatter happins gdp2002, mlab(country_name) mlabsize(small). Luxembourg should be that country. Notice the two axis, which are dataset labels for our two variables, happins gdp2002. The variable you type first will be displayed on the y-axis. ii) Let s make the graph user friendly. We can do so by naming the graph and the axis. scatter happins gdp2002, mlabel(country_name) mlabelsize(vsmall) title(scatterplot: Happiness Score and GDP/capita) ytitle(happiness Score) xtitle(gdp/capita) iii) Outliers can be dangerous in Econometrics. If consider Luxembourg an outlier we can easily get rid of the observation. By adding an if option we can graph the scatter diagram without displaying Luxembourg. drop if country == Luxembourg J. We are done with the analysis for these variables. However, let s save the dataset and close the log file before moving on. i) File -> save II. Analysis: Money A) It is now time to use a different dataset. Before getting started we need to use some of the commands from the logistics section on page 2. i) clear ii) set more off iii) cap log close iv) File -> log -> begin -> selected a folder -> pick a name -> Save it v) Open the dataset.. File -> open -> find and open CEOSAL1.dta vi) Then save the file before getting started. File -> Save As -> selected a folder -> pick a name -> Save it B) It s generally a good thing to look at the variable in the dataset. describe C) The two variables of interest are CEO salaries and return on equity. list salary roe if _n <25 D) sum E) The industry that the data is drawn from should give additional information. This is a discrete variable. To better way to describe this types of variables is

6 Econ 120B Stata Guide Hw1 Claudio Labanca Love Lofstrom 6 through the command tab indus F) It is possible to look at the cross-tab of two discrete variables. The cross-tab reports the relative frequency within its row for each cell. In our example, it gives the conditional distribution of financial firms given that industrial firms take value 0 for the first row or 1 for the second row. It is essentially the conditional distribution of the column variable given the row variable. tabulate indus finance, row G) We can also find the conditional distribution of the row variable given the column variable. tabulate indus finance, column H) Lastly it is possible to get the joint distribution of industrial firms and financial firms. tabulate indus finance, cell I) Let us look at the correlation between salary and return on equity while excluding potential outliers. corr salary roe if salary <5000 J) It is time to create another scatter plot. In order for the axis to be easier to read we are going to format them. We want to see how many CEOS make more than $5,000,000/year and how many companies that have ROE of 50% or higher. scatter salary roe, yline(5000) xline(50) K) Let s plot a histogram for salary. i) hist salary ii) histogram salary, normal (this compares the histogram of salary to a normal plot) iii) Different representations of incomes, e.g, salary, are usually represented as the natural log of salary. histogram lsalary, normal. This creates a histogram that is more traceable compared to the previous one. L) hist roe, normal M) File -> Save N) File -> log -> close Good luck!

7 Econ 120B Stata Guide Hw1 Claudio Labanca Love Lofstrom Summary Table of the Logical Expressions in Stata 7 Command Short description < less than <= less than or equal == equal > greater than >= greater than or equal!= not equal & and or! not Summary Table of the Stata Commands seen in Tutorial 1 Command Short description Example describe summarize sort browse list count tabstat variable_name, statistics (sum) will show characteristics of the variable/s contained in the dataset will give you summary statistics on the variables that you enter. will re-arrange the variable in ascending order. will show you the data in cell-format (like excel). can be used to find the value of a particular variable. to see how many countries that are happier than a specific country. des variable_name sum variable_name sort variable_name list country_name happins religion if religion == Shia Islam count if happins > generate to create a variable gen variable_name = insert_formula tabstat allows us to produce a table of statistics. tabstat variable_name add up all of the values tabstat variable_name, stored for a certain statistics (sum) variable.

8 Econ 120B 8 Stata Guide Hw1 Claudio Labanca Love Lofstrom display In order to do calculations in Stata display sqrt(1.05e+10/74) histogram plot a histogram histogram variable_name, normal corr look at the correlation between variables corr variable_name1 variable_name2 scatter will graph a scatter plot of their relationship. scatter variable_name1 variable_name2 tab additional information to describe variables tab indus

9 STATA Tutorial #2 KEY Type into Command box Left Click If you need any additional guidance, or are having other issues with STATA, try the following: Attend office hours, the exact times of which can be found on TED. Use the help command on STATA or Google (i.e. help scatter if you want clarification on how the scatter command works). Send questions to 1. clear 2. cap log close a. The cap log close command, in this case, tells STATA to close any log files you may currently have open. 3. File > Log > Begin a. This allows you to begin a new log (which you will need to do in order to turn in your homework assignments). Make sure to save your log as a.log to receive full points on your homework assignment! 4. File > Open a. Open your dataset (wine.dta). b. Alternatively, you could choose to use STATA s use command, which also tells STATA to load a designated dataset. 5. save wine_out.dta, replace a. We don t want to actually alter the original dataset (wine.dta) so we will save it under a new name in this case, wine_out.dta. b. The replace command here tells STATA to replace our previous dataset file with our new wine_out.dta. 6. describe a. The describe command shows us what our dataset contains: the number of observations, variables, etc. Often, it will also give a brief description of what each variable represents. 7. scatter alcohol heart, mlabel(country) mlabsize(vsmall) a. We are now using the scatter command to create a scatterplot representing the relationship between alcohol consumption and heart disease. Note that alcohol consumption, listed first here, is on the Y-axis; while heart disease, listed second here, is on the X-axis.

10 b. The mlabel option allows us to label the points by country, while the mlabsize option allows us to manipulate the appearance of said labels (in this case, vsmall tells STATA to make the label text very small). c. We can see, based on the scatterplot produced, that the two variables appear to be negatively correlated such that the higher the wine consumption, the lower the deaths by heart disease. 8. scatter alcohol liver, mlabel(country) mlabsize(vsmall) a. We can create a similar scatterplot to observe the relationship between alcohol consumption and deaths by liver disease (in this case, the variables appear to be positively correlated). 9. regress heart alcohol, robust a. We now want to run a regression between deaths by heart disease and wine consumption. The regress command tells STATA to run a linear regression. i. Recall that if errors are not homoscedastic, we must use heteroscedastic robust standard errors in order to make valid inferences. We can tag on the robust option to accommodate this. b. STATA gives us a lot of information: in the top right corner, we can see the sample size, the standard error, and the R-Squared. We are also told the degrees of freedom, estimated coefficients, and standard errors, displayed in other regions of the command output. 10. display / a. We can manually calculate the R-squared using the display command. i. The Explained Sum of Squares (ESS) is given to us by Stata as the Model SS; the Unexplained Sum of Squared Residuals (SSR) is given to us as the Residual SS; and the Total Sum of Squares (TSS) is given to us as the Total SS. ii. To calculate the R-squared, divide the ESS value by the TSS value ( / ). 11. display 1-( / ) a. Alternatively, we can calculate the R-squared using the formula 1-(SSR/TSS). Again we can show this on STATA using the display command. 12. display _b[_cons] + _b[alcohol]* 8 a. STATA stores the coefficient values in the form of the variable _b. Thus _b[_cons] gives me the coefficient of the constant term (the intercept). Meanwhile _b[alcohol] gives us the slope of the regression line. b. To predict the value of deaths by heart disease in a country with a wine-per-capita consumption of 8 liters per year, use the display command as shown above. We are essentially plugging 8 [liters] into the regression line. 13. twoway (lfit heart alcohol) (scatter heart alcohol, mlabel(country) mlabsize(vsmall)) a. The twoway command produces a twoway graph according to our specifications. i. The lfit option generates a line of best fit through our original scatterplot (initially generated in step 9). The next two steps (16-17) are somewhat irrelevant to the tutorial as a whole but will help you in the completion of your second homework assignment.

11 14. twoway (lfit heart alcohol) (scatter heart alcohol, mlabel(country) mlabsize(vsmall)) (function y= *x, range(alcohol)) a. The function option appended to our command back in step 15 draws a function in the above graph in this case, y = *x. 15. twoway (lfit heart alcohol) (scatter heart alcohol, mlabel(country) mlabsize(vsmall)) (function y= *x, range(alcohol)), legend(order(1 2 "Observed" 3 "A function of interest")) a. Here we ll attempt to make the graph legend a little clearer. The legend option allows us to label our graph more deliberately (to better illustrate this, try also twoway (lfit heart alcohol) (scatter heart alcohol, mlabel(country) mlabsize(vsmall)) (function y= *x, range(alcohol)) and see what your key would look like in this case). 16. predict yhat_h a. This saves all fitted values. b. Data > Variables Manager shows the new variable yhat_h, labelled Fitted Values. 17. predict uhat_h, residuals a. Let s also save the residuals from the regression. Again, Data > Variables Manager should show you the new variable uhat_h, labelled Residuals. 18. generate uhat_alt= heart - yhat_h a. Experimentally, we can verify that the difference between the actual observed value and the value predicted by the model equals the residual. 19. drop uhat_alt a. Drop the variable uhat_alt. 20. tabstat uhat_h, statistic(sum) a. We can check to see that the sum of the residuals equals zero using the tabstat command, with the statistic(sum) option. 21. rvpplot alcohol, yline(0) mlabel(country) mlabsize(vsmall) a. Using the rvpplot command, we can plot the residuals. Note that the value of the residual are shown on the vertical axis, and that level of alcohol consumption is displayed on the horizontal axis. 22. rvfplot, yline(0) mlabel(country) mlabsize(vsmall) a. Let s now instead plot the residuals against the fitted values. We observe a plot of the residuals against the fitted values, given by the rvfplot command. 23. sort uhat_h a. Use the sort command to organize the residuals in ascending order (recall the sort command from the first tutorial and homework assignment). 24. list country alcohol heart yhat_h uhat_h a. Using the list command, try to observe the typical size of the residuals. By observing the residual values, we can more readily see the countries that don t work well with the OLS regression. 25. regress heart alcohol if country!= "Japan" a. We can see that Japan doesn t seem to work well with this regression model (note its large residual). Let s try running the regression without Japan.

12 b. The if country!= Japan tells STATA to run the regression if the country s name is not Japan. 26. set seed a. STATA can be used to generate a random sample of size n; suppose this random sample is called bsample. In order to generate a sample we must set a seed value, in this case a number. The seed can be whatever number you like; let s here use bsample 10 a. To take our random sample, we ll use the bsample command, followed by our desired sample size. We ll use a sample size of n= describe a. Use the describe command to see your 10 observations. 29. regress heart alcohol a. Let s run the regression again, on our 10 observations. 30. save wine_out.dta, replace a. Close the current dataset. 31. clear a. Let s begin anew. 32. File > Open a. We will now use the dataset with CEO salaries. Locate and open it in STATA. 33. save ceosal2_tut2.dta, replace 34. describe a. Use the describe command to familiarize yourself with the new dataset. Observe the variables, their descriptions, etc. 35. regress salary ceoten a. Let s run a regression between predicted salary (salary) and the number of years an individual has been a CEO (ceoten). 36. twoway (scatter salary ceoten) (lfit salary ceoten), legend(order(1 "Observed" 2 "Fitted by Linear Model")) a. Use the twoway command to create a twoway graph that illustrates the relationship between salary and length of CEO tenure. Note the line of best fit that appears alongside the data points on the scatterplot. 37. regress lsalary ceoten a. We ll use the regress command to regress the log of salary on CEO tenure. 38. twoway (scatter lsalary ceoten) (lfit lsalary ceoten) a. Again, let s use the twoway command to create a twoway graph that shows us visually the line of best fit through a scatterplot of the data points. 39. Predicted_salary = exp(bo_hat + b1_hat * ceoten) a. It is possible for us to observe this relationship using salary instead of the log of salary. Note that if Predicted_log(salary) = b0_hat + b1_hat ceoten, then we can find a value for the predicte salary such that Predicted_salary = exp(bo_hat + b1_hat * ceoten). 40. twoway (scatter salary ceoten) (function y = exp(_b[_cons] + _b[ceoten]*x), range(ceoten)), legend(order(1 "Observed" 2 "Fitted by Log Model")) a. From here, we can now graph a twoway graph that visually expresses the relationship between salary and CEO tenure.

13 41. regress lsalary lsales a. Let s regress the log of salary on the log of sales. We are effectively estimating a constant elasticity model that relates the CEO s salary to sales generated by the firm in millions of dollars. This relationship is modeled by log(salary) = b0 + b1 log(sales) + u. 42. regress salary ceoten 43. summarize a. Recall that the summarize command can be used to familiarize ourselves with the dataset: here we can use it to find values such as the average salary and tenure of a CEO. 44. display _b[_cons] + _b[ceoten]* a. If we plug the average tenure of the CEO in our estimated regression, we should get back the average salary of a CEO. We can use STATA to verify this. 45. regress salary ceoten, robust a. Recall that if the errors are not homoscedastic, homoscedasticity-only standard errors of the estimators are not appropriate. If errors are not homoscedastic, then we must use heteroscedastic robust standard errors in order to make valid inferences. b. To tell STATA that we want heteroscedasticity-robust errors (as opposed to heteroscedasticity-only errors, which STATA gives us by default) we tag on the robust option. 46. set seed a. Again, STATA allows us to generate a random sample of size n. Recall that to do so, we must set a seed value, here just a numeric value. Let s use bsample 100 a. Let s set our sample size to describe a. The describe command should show you that we do in fact have 100 observations in our dataset now. 49. regress salary ceoten, robust a. We can perform our last regression again, but this time with our new, reduced set of 100 observations. 50. use CEOSAL2_tut2.DTA, clear a. Let s return to our old dataset. 51. describe a. Note that we are back to our original 177 observations. 52. set seed a. Now we ll take a different random sample and perform the regression again. In this case, let s now use a different seed value, bsample regress salary ceoten, robust a. Observe that the estimated coefficients are different than those obtained before, since we took a different random sample of size save CEOSAL2_tut2.dta, replace 56. File > Log > Close a. Close the log and finish!

14 Summary Table of the Stata Commands seen in Tutorial 2 Command Short description Example regress performs linear regression on variables regress depvar indepvar,option Note: depvar: vertical axis indepvar: horizontal axis the option robust can be used to obtain correct standard errors when errors are heteroskedastic twoway twoway lfit predict rvpplot rvfpplot bsample set seed plots twoway graphs (scatter, line, etc); adds a line of best fit to the graph obtains predictions, residuals, etc., after estimation plots the residual on the vertical axis and the specified variable on the horizontal axis plots residual on the vertical axis and the fitted y on the horizontal axis draws bootstrap samples (random samples with replacement) from the data in memory. must set seed value before generating sample twoway scatter variable1 variable 2 Note: when the only type of graph is scatterplot or line, twoway may be omitted when inputting the command twoway (scatter variable1 variable 2) (lfit variable1 variable2) predict variable, option Note: the option residuals generates residuals rvpplot variable Note: variable can be for example the x variable a regression rvfplot, options Note: some examples of options are yline(), mlabel(), mlabsize() bsample sample_size Note: before inputting the command, set seed number set seed number

15 STATA Tutorial #3 If you need any additional guidance, or are having other issues with STATA, try the following: Attend office hours, the exact times of which can be found on TED. Use the help command on STATA or Google (i.e. help scatter if you want clarification on how the scatter command works). Send questions to clear 2. cap log close a. The cap log close command, in this case, tells STATA to close any log files you may currently have open. 3. cd CURRENT DIRECTORY PATH The cd command will set the current directory in Stata. This is the directory where your data are saved and where you want the log files, graphs etc to be saved. In order for Stata to find that folder we need to indicate a CURRENT DIRECTORY PATH. To get this to work, create a folder on your desktop. In that folder create other two folders, one called logs, the second one called data. Save your data (i.e. dta files) in the data folder. To find out the CURRENT DIRECTORY PATH, right click on either the logs or data folder. Then click on Properties. In the window that pops up, copy and paste the path that you find on the right of Location in place of the words CURRENT DIRECTORY PATH after cd. Don t forget to keep the quotes. Example: cd C:\Desktop\Stata Tutorial 3\ will set the current directory to be the folder called Stata Tutorial 3 on the Desktop of this computer C. 4. log using logs\tutorial3.log, replace a. This allows you to begin a new log (which you will need to do in order to turn in your homework assignments). Make sure to save your log as a.log to receive full points on your homework assignment! The replace option will replace any existing log file. 5. use data\vote.dta, clear a. Begin by opening the dataset (vote1.dta). The clear option will clear the memory in Stata from any existing data file. 6. save vote1_out.dta, replace a. We don t want to actually alter the original dataset (vote1.dta) so we will save it under a new name in this case, vote1_out.dta. b. The replace command here tells STATA to replace our previous dataset file with our new vote1_out.dta.

16 7. describe a. The describe command shows us what our dataset contains: the number of observations, variables, etc. Often, it will also give a brief description of what each variable represents. 8. generate id=_n a. Let s generate and id each observation, using this command we now have the observations numbered. 9. browse a. Notice how there's a new variable (last column), the one you just generated (id). Also notice the units in which the variables are: for example, votea and prtystr are in percentage points, so a value of 43 for votea means that candidate A got 43% of the votes. 10. reg votea expenda expendb, robust a. Let s start by regressing the percentage vote received by the incumbent, and the campaign expenditures incurred by each candidate. b. In the top right corner, you will find, among others, the overall F-statistic (test of the joint hypothesis that all the slope coefficients are zero), the R-squared and what we call SER (standard error of the regression), which STATA calls Root MSE (mean squared error). In the following table, you find the 3 estimates of the coefficients, the robust standard errors and the t-statistics (test the hypothesis that each individual coefficient is zero). Now, let s interpret the meaning of the estimated regression coefficients. i. When expenditures for both parties are 0, the percentage of votes received by candidate A (the incumbent) is predicted to be 49.6 percentage points, on average. ii. An increase in expenditures by candidate A of $1000 is predicted to increase, on average, his/her total vote by 0.38 percentage points, keeping candidate B's (the challenger) expenditures constant. iii. For each $1000 increase in expenditures by candidate B, candidate A will lose, on average, about.036 percentage points, when candidate A's expenditures are held constant. c. 11. display _b[cons] + _b[expendb]*2+_b[expenda] a. Use the command to show the estimated increase in the percentage of votes for $1000 more expenda when expendb=2 12. test expenda expendb a. To test the hypothesis that both coefficients are equal to zero 13. test expenda a. To test the hypothesis that the coefficient on expenda is different from 0 we can use the command test as show above.

17 b. Being the P-value smaller than 0.01, we reject the null hypothesis 14. test (expenda=1) (expendb=0) a. We use this to test the joint hypothesis that the coefficient on expenda is equal to 1 and that the coefficient on expendb equals 0. To comment on the fit of the model, notice that both slope coefficients are highly significant and the R-squared demonstrates that this model explains about 53% of the variance of vote share. i. The SER (Root MSE) indicates that the typical deviation from the predicted value of each electoral district is about 11.6 percentage points, but this number is hard to evaluate in isolation. In short, this is a reasonably good fit for a model. 15. sum expenda expendb display _b[_cons]+ _b[expenda]* _b[expendb]* a. To predict the fraction of votes for candidate A at the average expenditure of A and expenditure B, first find out the average of expenda and expendb using the command sum (above) b. thus multiply the coefficient of each variable by the average found in point a 16. sum expenda a. We can see what happens to percent vote for the incumbent if incumbent campaign spending increased by one standard deviation, while the challenger's expenditures remains fixed 17. display _b[expenda]* a. Multiply the coefficient for expenda by its standard deviation b. All else equal, a one standard deviation increase in expenditures by the incumbent would lead to an increase in vote share in about 10.8 percentage points. 18. gen lnvotea=log(votea) gen lnexpenda=log(expenda) reg lnvotea lnexpenda expendb, robust a. Suppose you want to know the percentage change in votea for a 1% change in expenda. You can directly obtain this result from the regression by running a log regression. Keeping expenditure for candidate B constant, a 1% increase in expenditure for candidate A corresponds to a 0.17% increase in the percentage of votes received by candidate A. 19. generate expenda_sq= expenda^2 reg votea expenda expenda_sq, robust a. Imagine you are the adviser for an incumbent candidate. You come across with a theory that there are diminishing marginal returns to campaign expenditures by incumbent candidates. b. You want to test this theory, so you decide to model the relationship between

18 percent vote and expenditures for the incumbents as a quadratic function. i. What does the regression results show you? ii. There appear to be diminishing marginal returns to expenditures. Notice that the coefficient on the squared value of incumbent expenditures is negative. iii. This indicates that each new increase in expenditures will yield less new returns than the value before. Eventually, we will reach a point where increasing expenditures actually cost an incumbent votes. How do you explain this turn around point? iv. A possible explanation is that airwaves become fully saturated and overexposure leads voters in a particular district to turn against the candidate. 20. twoway (scatter votea expenda) (qfit votea expenda), legend(order(1 2 "Quadratic Fit")) a. We plot the estimated relation. b. Scatter shows you the points in your sample, qfit plots the estimated quadratic relationship 21. twoway (scatter votea expenda) (qfit votea expenda) (lfit votea expenda), legend(order(1 2 "Quadratic Fit" 3 "Linear Fit") a. In this graph, we compare the quadratic fit with the linear fit. To test the theory, beyond visual comparison of the two fits, we can formally test the hypothesis that the relationship between votea and expenda is linear, against the alternative that it is nonlinear. If the relationship is linear, the coefficient on expenda_sq is zero. The t-statistic for this test is -6, thus we reject the null hypothesis. There is evidence that the relationship is nonlinear 22. display (_b[_cons]+ _b[expenda]*110+_b[expenda_sq]*110^2) - (_b[_cons]+_b[expenda]*100+_b[expenda_sq]*100^2) 23. display (_b[_cons]+_b[expenda]*510+_b[expenda_sq]*510^2) - (_b[_cons]+_b[expenda]*500+_b[expenda_sq]*500^2) a. To show that there are diminishing marginal returns to campaign expenditures, we compute the effect of increasing campaign expenditure by $10,000, when spending is $100,000 and when spending is $500,000 i. Adding an additional $1000 in spending after having already spent $100,000 will lead to an additional 0.69 percentage points in voting for candidate A. ii. But, adding an additional $1000 in spending after having already spent $500,000 will only lead to an additional 0.23 percentage points in voting for candidate A. 24. count if expenda > 700 a. The visual analysis of the scatter plot reveals that there is a turning point at

19 around $700,000 in spending. We want to see if there are a lot of districts with incumbent expenditures over $700, list id state district expenda if expenda > 700 a. To know which are those districts, you can use the list command. 26. gen sharea_dummy=(sharea>50) gen votea_dummy=(votea>50) tab sharea_dummy tab votea_dummy reg votea_dummy sharea_dummy, robust a. Suppose candidate A wants to know: what's the effect of spending more than candidate B on the probability of getting more than 50% of the votes. You can find that out generating the variables above. b. Having higher expenditure increases the probability of having the majority of votes by (0.84*100) percentage points. 27. reg votea expenda expenda_sq expendb prtystra, robust a. There is other factors besides just incumbent spending that influence votes. Vote share of the incumbent is also affected by the opponent's spending (expendb) and the strength of your own party (prtystra). We run a regression controlling for those factors. b. All coefficients are significantly different from zero, at the 1% significance level. There are still diminishing marginal returns to incumbent campaign expenditure. c. With other variables held constant, an increase of $1000 in the opponent's spending, will cost the incumbent percentage points of the vote share. d. An increase in the strength of the incumbent's party of 1 percentage point, keeping all other variables constant, will yield 0.32 percentage point increase in the incumbent's vote share. e. With this model we have now explained 65% of the variation in the vote share of the incumbent. More importantly, we have reduced the SER, which indicates that we are starting to achieve a relatively good fit 28. sum expenda expendb prtystra 29. display _b[_cons]+_b[expenda]* _b[expenda_sq]*( ^2)+_b[expendB]* _b[prtystrA]*65 a. You want to predict the incumbent share of the vote, if party strength were 65 percent, and the candidates kept their expenditures at their mean levels. b. About 58.46% of the vote 30. reg votea lexpenda, robust a. In general, when you want to do a regression with a variable in logarithm form,

20 you have to generate that variable, by writting for example, generate ln_expenda=ln(expenda). In this case, the log of campaign expenditures for each candidate are already variables in this dataset, so we don't need to generate them. b. The coefficient in is highly significant and indicates that the 1% increase in expenditure, would yield an increase in vote share of (6.51/100)= percentage points. 31. twoway (scatter votea lexpenda) (lfit votea lexpenda), legend(order(1 "Actual Values" 2 "Fitted Values")) a. Plot the relationship between votea and log(expenda) and the fitted line. 32. reg votea lexpenda lexpendb prtystra, robust a. Now, we keep the linear-log specification but, fearing omitted variable bias, we add control variables log(expendb) and prtystra. b. Interpretation of results: A 1% increase in incumbent expenditures leads to an increase in incumbent vote share in the amount of percentage points, keeping all other variables constant. c. A 1% increase in challenger expenditures leads to a reduction in incumbent vote share of percentage points, keeping all other variables constant. d. An increase in the incumbent's party strength of 1 percentage point, leads to an increase in incumbent vote share of 0.15 percentage points, keeping all other variables constant. e. We are confident with the results of this model. All variables are highly significant. We have explained 79% of the variation in incumbent vote share and the SER has been reduced to only 7.7 percentage points 33. display_b[_cons]+_b[lexpenda]*(ln(400))+_b[lexpendb]*(ln(500))+_b[prtystra]* 50 a. Compute the predicted vote share for your candidate if his/her expenditures are $400,000 and the opponents are $500,000 and the incumbent's party strength is 50% 34. display_b[_cons]+_b[lexpenda]*(ln(600))+_b[lexpendb]*(ln(500))+_b[prtystra]* 50 a. Compute what happens if your candidate increases expenditures to $600,000, keeping the other variables constant. 35. display _b[lexpenda]*(ln(600)-ln(400)) a. The increase in your candidates' vote share would be 2.47 percentage points, from to percent. You can compute this increase directly by using the command above. 36. save vote1_out.dta, replace clear

21 a. Close this dataset. 37. log close a. Close the log. Summary Table of the Stata Commands seen in Tutorial 3 Command Short description Example regress Running a linear regression reg votea expenda expendb, robust on multiple variables test twoway generate count list running a log regression on multiple variables To test the hypothesis that the coefficient is different from 0 To test the joint hypothesis that the coefficient on variable one is different from 1 and that the coefficient on variable 2 is different from 0 To plot the estimated relation between two variables To generate dummy variables To generate and id for each observation To see how many districts are over a particular value To show the name of the those districts that are over the particular value reg lnvotea lnexpenda expendb, robust test expenda test (variable1=1) (expendb=0) twoway (scatter votea expenda) (qfit votea expenda) (lfit votea expenda), legend(order(1 2 "Quadratic Fit" 3 "Linear Fit") gen sharea_dummy=(sharea>50) generate id=_n count if expenda > 700 list id state district expenda if expenda > 700

Introduction of STATA

Introduction of STATA Introduction of STATA News: There is an introductory course on STATA offered by CIS Description: Intro to STATA On Tue, Feb 13th from 4:00pm to 5:30pm in CIT 269 Seats left: 4 Windows, 7 Macintosh For

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Univariate Statistics Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved Table of Contents PAGE Creating a Data File...3 1. Creating

More information

Introduction to Stata Session 1

Introduction to Stata Session 1 Introduction to Stata Session 1 Tarjei Havnes 1 ESOP and Department of Economics University of Oslo 2 Research department Statistics Norway ECON 3150/4150, UiO, 2014 Preparation Before we start: 1. Sit

More information

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening r's age when 1st child born 2 4 6 Density.2.4.6.8 Density.5.1 Sociology 774: Regression Models for Categorical Data Instructor: Natasha Sarkisian Preliminary Data Screening A. Examining Univariate Normality

More information

Multiple Regression. Dr. Tom Pierce Department of Psychology Radford University

Multiple Regression. Dr. Tom Pierce Department of Psychology Radford University Multiple Regression Dr. Tom Pierce Department of Psychology Radford University In the previous chapter we talked about regression as a technique for using a person s score on one variable to make a best

More information

LECTURE 17: MULTIVARIABLE REGRESSIONS I

LECTURE 17: MULTIVARIABLE REGRESSIONS I David Youngberg BSAD 210 Montgomery College LECTURE 17: MULTIVARIABLE REGRESSIONS I I. What Determines a House s Price? a. Open Data Set 6 to help us answer this question. You ll see pricing data for homes

More information

Using SPSS for Linear Regression

Using SPSS for Linear Regression Using SPSS for Linear Regression This tutorial will show you how to use SPSS version 12.0 to perform linear regression. You will use SPSS to determine the linear regression equation. This tutorial assumes

More information

Soci Statistics for Sociologists

Soci Statistics for Sociologists University of North Carolina Chapel Hill Soci708-001 Statistics for Sociologists Fall 2009 Professor François Nielsen Stata Commands for Module 11 Multiple Regression For further information on any command

More information

SPSS Guide Page 1 of 13

SPSS Guide Page 1 of 13 SPSS Guide Page 1 of 13 A Guide to SPSS for Public Affairs Students This is intended as a handy how-to guide for most of what you might want to do in SPSS. First, here is what a typical data set might

More information

Chapter 2 Part 1B. Measures of Location. September 4, 2008

Chapter 2 Part 1B. Measures of Location. September 4, 2008 Chapter 2 Part 1B Measures of Location September 4, 2008 Class will meet in the Auditorium except for Tuesday, October 21 when we meet in 102a. Skill set you should have by the time we complete Chapter

More information

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney Name: Intro to Statistics for the Social Sciences Lab Session: Spring, 2015, Dr. Suzanne Delaney CID Number: _ Homework #22 You have been hired as a statistical consultant by Donald who is a used car dealer

More information

4.3 Nonparametric Tests cont...

4.3 Nonparametric Tests cont... Class #14 Wednesday 2 March 2011 What did we cover last time? Hypothesis Testing Types Student s t-test - practical equations Effective degrees of freedom Parametric Tests Chi squared test Kolmogorov-Smirnov

More information

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014 ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014 Instructions: Answer all five (5) questions. Point totals for each question are given in parentheses. The parts within each

More information

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pages 37-64. The description of the problem can be found

More information

Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups

Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015 We saw that the

More information

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT STAT 512 EXAM I STAT 512 Name (7 pts) Problem Points Score 1 40 2 25 3 28 USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT WRITE LEGIBLY. ANYTHING UNREADABLE WILL NOT BE GRADED GOOD LUCK!!!!

More information

Two Way ANOVA. Turkheimer PSYC 771. Page 1 Two-Way ANOVA

Two Way ANOVA. Turkheimer PSYC 771. Page 1 Two-Way ANOVA Page 1 Two Way ANOVA Two way ANOVA is conceptually like multiple regression, in that we are trying to simulateously assess the effects of more than one X variable on Y. But just as in One Way ANOVA, the

More information

Getting Started with HLM 5. For Windows

Getting Started with HLM 5. For Windows For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 About this Document... 3 1.2 Introduction to HLM... 3 1.3 Accessing HLM... 3 1.4 Getting Help with HLM... 3 Section 2: Accessing

More information

Gush vs. Bore: A Look at the Statistics of Sampling

Gush vs. Bore: A Look at the Statistics of Sampling Gush vs. Bore: A Look at the Statistics of Sampling Open the Fathom file Random_Samples.ftm. Imagine that in a nation somewhere nearby, a presidential election will soon be held with two candidates named

More information

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011 ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011 Instructions: Answer all five (5) questions. Point totals for each question are given in parentheses. The parts within each

More information

Problem set 3: hypothesis testing and model selection

Problem set 3: hypothesis testing and model selection Problem set 3: hypothesis testing and model selection September 16, 2013 1 Introduction This problem set is meant to accompany the undergraduate econometrics video series on Youtube; covering roughly the

More information

10.2 Correlation. Plotting paired data points leads to a scatterplot. Each data pair becomes one dot in the scatterplot.

10.2 Correlation. Plotting paired data points leads to a scatterplot. Each data pair becomes one dot in the scatterplot. 10.2 Correlation Note: You will be tested only on material covered in these class notes. You may use your textbook as supplemental reading. At the end of this document you will find practice problems similar

More information

. *increase the memory or there will problems. set memory 40m (40960k)

. *increase the memory or there will problems. set memory 40m (40960k) Exploratory Data Analysis on the Correlation Structure In longitudinal data analysis (and multi-level data analysis) we model two key components of the data: 1. Mean structure. Correlation structure (after

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data Intro to Statistics for the Social Sciences Fall, 2017, Dr. Suzanne Delaney Extra Credit Assignment Instructions: You have been hired as a statistical consultant by Donald who is a used car dealer to help

More information

Correlation and Simple. Linear Regression. Scenario. Defining Correlation

Correlation and Simple. Linear Regression. Scenario. Defining Correlation Linear Regression Scenario Let s imagine that we work in a real estate business and we re attempting to understand whether there s any association between the square footage of a house and it s final selling

More information

Week 10: Heteroskedasticity

Week 10: Heteroskedasticity Week 10: Heteroskedasticity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline The problem of (conditional)

More information

Clovis Community College Class Assessment

Clovis Community College Class Assessment Class: Math 110 College Algebra NMCCN: MATH 1113 Faculty: Hadea Hummeid 1. Students will graph functions: a. Sketch graphs of linear, higherhigher order polynomial, rational, absolute value, exponential,

More information

Computing Descriptive Statistics Argosy University

Computing Descriptive Statistics Argosy University 2014 Argosy University 2 Computing Descriptive Statistics: Ever Wonder What Secrets They Hold? The Mean, Mode, Median, Variability, and Standard Deviation Introduction Before gaining an appreciation for

More information

Econometric Analysis Dr. Sobel

Econometric Analysis Dr. Sobel Econometric Analysis Dr. Sobel Econometrics Session 1: 1. Building a data set Which software - usually best to use Microsoft Excel (XLS format) but CSV is also okay Variable names (first row only, 15 character

More information

Exploring Functional Forms: NBA Shots. NBA Shots 2011: Success v. Distance. . bcuse nbashots11

Exploring Functional Forms: NBA Shots. NBA Shots 2011: Success v. Distance. . bcuse nbashots11 NBA Shots 2011: Success v. Distance. bcuse nbashots11 Contains data from http://fmwww.bc.edu/ec-p/data/wooldridge/nbashots11.dta obs: 199,119 vars: 15 25 Oct 2012 09:08 size: 24,690,756 ------------- storage

More information

Applied Econometrics

Applied Econometrics Applied Econometrics Lecture 3 Nathaniel Higgins ERS and JHU 20 September 2010 Outline of today s lecture Schedule and Due Dates Making OLS make sense Uncorrelated X s Correlated X s Omitted variable bias

More information

SCENARIO: We are interested in studying the relationship between the amount of corruption in a country and the quality of their economy.

SCENARIO: We are interested in studying the relationship between the amount of corruption in a country and the quality of their economy. Introduction to SPSS Center for Teaching, Research and Learning Research Support Group American University, Washington, D.C. Hurst Hall 203 rsg@american.edu (202) 885-3862 This workshop is designed to

More information

Getting Started with OptQuest

Getting Started with OptQuest Getting Started with OptQuest What OptQuest does Futura Apartments model example Portfolio Allocation model example Defining decision variables in Crystal Ball Running OptQuest Specifying decision variable

More information

16 What Effect Does Changing the Minimum Wage Have on Employment?

16 What Effect Does Changing the Minimum Wage Have on Employment? 16 What Effect Does Changing the Minimum Wage Have on Employment? Introduction Many people advocate raising the minimum wage as a means of raising the standard of living of many of the poor. (The term

More information

Activities supporting the assessment of this award [3]

Activities supporting the assessment of this award [3] Relevant LINKS BACK TO ITQ UNITS [1] Handbook home page [2] Overview This is the ability to use a software application designed to record data in rows and columns, perform calculations with numerical data

More information

Untangling Correlated Predictors with Principle Components

Untangling Correlated Predictors with Principle Components Untangling Correlated Predictors with Principle Components David R. Roberts, Marriott International, Potomac MD Introduction: Often when building a mathematical model, one can encounter predictor variables

More information

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING INTRODUCTION JMP software provides introductory statistics in a package designed to let students visually explore data in an interactive way with

More information

Radio buttons. Tick Boxes. Drop down list. Spreadsheets Revision Booklet. Entering Data. Each cell can contain one of the following things

Radio buttons. Tick Boxes. Drop down list. Spreadsheets Revision Booklet. Entering Data. Each cell can contain one of the following things Spreadsheets Revision Booklet Entering Data Each cell can contain one of the following things Spreadsheets can be used to: Record data Sort data (in ascending A-Z, 1-10 or descending (Z-A,10-1) order Search

More information

Chapter 3. Table of Contents. Introduction. Empirical Methods for Demand Analysis

Chapter 3. Table of Contents. Introduction. Empirical Methods for Demand Analysis Chapter 3 Empirical Methods for Demand Analysis Table of Contents 3.1 Elasticity 3.2 Regression Analysis 3.3 Properties & Significance of Coefficients 3.4 Regression Specification 3.5 Forecasting 3-2 Introduction

More information

BUS105 Statistics. Tutor Marked Assignment. Total Marks: 45; Weightage: 15%

BUS105 Statistics. Tutor Marked Assignment. Total Marks: 45; Weightage: 15% BUS105 Statistics Tutor Marked Assignment Total Marks: 45; Weightage: 15% Objectives a) Reinforcing your learning, at home and in class b) Identifying the topics that you have problems with so that your

More information

EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS

EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES by MICHAEL MCCANTS B.A., WINONA STATE UNIVERSITY, 2007 B.S., WINONA STATE UNIVERSITY, 2008 A THESIS submitted in partial

More information

Day 1: Confidence Intervals, Center and Spread (CLT, Variability of Sample Mean) Day 2: Regression, Regression Inference, Classification

Day 1: Confidence Intervals, Center and Spread (CLT, Variability of Sample Mean) Day 2: Regression, Regression Inference, Classification Data 8, Final Review Review schedule: - Day 1: Confidence Intervals, Center and Spread (CLT, Variability of Sample Mean) Day 2: Regression, Regression Inference, Classification Your friendly reviewers

More information

CHAPTER 5 FIRM PRODUCTION, COST, AND REVENUE

CHAPTER 5 FIRM PRODUCTION, COST, AND REVENUE CHAPTER 5 FIRM PRODUCTION, COST, AND REVENUE CHAPTER OBJECTIVES You will find in this chapter models that will help you understand the relationship between production and costs and the relationship between

More information

AP Statistics Scope & Sequence

AP Statistics Scope & Sequence AP Statistics Scope & Sequence Grading Period Unit Title Learning Targets Throughout the School Year First Grading Period *Apply mathematics to problems in everyday life *Use a problem-solving model that

More information

Semester 2, 2015/2016

Semester 2, 2015/2016 ECN 3202 APPLIED ECONOMETRICS 3. MULTIPLE REGRESSION B Mr. Sydney Armstrong Lecturer 1 The University of Guyana 1 Semester 2, 2015/2016 MODEL SPECIFICATION What happens if we omit a relevant variable?

More information

CHAPTER 8 T Tests. A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test

CHAPTER 8 T Tests. A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test CHAPTER 8 T Tests A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test 8.1. One-Sample T Test The One-Sample T Test procedure: Tests

More information

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Data Set for this Assignment: Download from the course website: Stata Users: framingham_1000.dta Source: Levy (1999) National

More information

Chapter 3. Displaying and Summarizing Quantitative Data. 1 of 66 05/21/ :00 AM

Chapter 3. Displaying and Summarizing Quantitative Data.  1 of 66 05/21/ :00 AM Chapter 3 Displaying and Summarizing Quantitative Data D. Raffle 5/19/2015 1 of 66 05/21/2015 11:00 AM Intro In this chapter, we will discuss summarizing the distribution of numeric or quantitative variables.

More information

COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION

COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION PLS 802 Spring 2018 Professor Jacoby COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION This handout shows the log of a STATA session that compares alternative estimates of

More information

KING ABDULAZIZ UNIVERSITY FACULTY OF COMPUTING & INFORMATION TECHNOLOGY DEPARTMENT OF INFORMATION SYSTEM. Lab 1- Introduction

KING ABDULAZIZ UNIVERSITY FACULTY OF COMPUTING & INFORMATION TECHNOLOGY DEPARTMENT OF INFORMATION SYSTEM. Lab 1- Introduction Lab 1- Introduction Objective: We will start with some basic concept of DSS. And also we will start today the WHAT-IF analysis technique for decision making. Activity Outcomes: What is what-if analysis

More information

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1 Soc 73994, Homework #2: Basics of Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 All answers should be typed and mailed to

More information

Using Excel s Analysis ToolPak Add In

Using Excel s Analysis ToolPak Add In Using Excel s Analysis ToolPak Add In Introduction This document illustrates the use of Excel s Analysis ToolPak add in for data analysis. The document is aimed at users who don t believe that Excel s

More information

Marginal Costing Q.8

Marginal Costing Q.8 Marginal Costing. 2008 Q.8 Break-Even Point. Before tackling a marginal costing question, it s first of all crucial that you understand what is meant by break-even point. What this means is that a firm

More information

Forecasting Introduction Version 1.7

Forecasting Introduction Version 1.7 Forecasting Introduction Version 1.7 Dr. Ron Tibben-Lembke Sept. 3, 2006 This introduction will cover basic forecasting methods, how to set the parameters of those methods, and how to measure forecast

More information

Know Your Data (Chapter 2)

Know Your Data (Chapter 2) Let s Get Started! Know Your Data (Chapter 2) Now we each have a time series whose future values we are interested in forecasting. The next step is to become thoroughly familiar with the construction of

More information

Session 7. Introduction to important statistical techniques for competitiveness analysis example and interpretations

Session 7. Introduction to important statistical techniques for competitiveness analysis example and interpretations ARTNeT Greater Mekong Sub-region (GMS) initiative Session 7 Introduction to important statistical techniques for competitiveness analysis example and interpretations ARTNeT Consultant Witada Anukoonwattaka,

More information

Bioreactors Prof G. K. Suraishkumar Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 02 Sterilization

Bioreactors Prof G. K. Suraishkumar Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 02 Sterilization Bioreactors Prof G. K. Suraishkumar Department of Biotechnology Indian Institute of Technology, Madras Lecture - 02 Sterilization Welcome, to this second lecture on Bioreactors. This is a mooc on Bioreactors.

More information

Eco311, Final Exam, Fall 2017 Prof. Bill Even. Your Name (Please print) Directions. Each question is worth 4 points unless indicated otherwise.

Eco311, Final Exam, Fall 2017 Prof. Bill Even. Your Name (Please print) Directions. Each question is worth 4 points unless indicated otherwise. Your Name (Please print) Directions Each question is worth 4 points unless indicated otherwise. Place all answers in the space provided below or within each question. Round all numerical answers to the

More information

How to Use Excel for Regression Analysis MtRoyal Version 2016RevA *

How to Use Excel for Regression Analysis MtRoyal Version 2016RevA * OpenStax-CNX module: m63578 1 How to Use Excel for Regression Analysis MtRoyal Version 2016RevA * Lyryx Learning Based on How to Use Excel for Regression Analysis BSTA 200 Humber College Version 2016RevA

More information

SPSS 14: quick guide

SPSS 14: quick guide SPSS 14: quick guide Edition 2, November 2007 If you would like this document in an alternative format please ask staff for help. On request we can provide documents with a different size and style of

More information

Also, big thank you to fellow TA Enoch Hill for edits and some additions to the guide.

Also, big thank you to fellow TA Enoch Hill for edits and some additions to the guide. Hello class, once again, here s an unofficial guide to the sample midterm. Please use this with caution, since 1) I am prone to error so incorrect explanations are entirely possible and 2) you should do

More information

Categorical Variables, Part 2

Categorical Variables, Part 2 Spring, 000 - - Categorical Variables, Part Project Analysis for Today First multiple regression Interpreting categorical predictors and their interactions in the first multiple regression model fit in

More information

Statistics 201 Summary of Tools and Techniques

Statistics 201 Summary of Tools and Techniques Statistics 201 Summary of Tools and Techniques This document summarizes the many tools and techniques that you will be exposed to in STAT 201. The details of how to do these procedures is intentionally

More information

Biostatistics 208 Data Exploration

Biostatistics 208 Data Exploration Biostatistics 208 Data Exploration Dave Glidden Professor of Biostatistics Univ. of California, San Francisco January 8, 2008 http://www.biostat.ucsf.edu/biostat208 Organization Office hours by appointment

More information

How to Use PPC Advertising to Grow Your Pool Business!

How to Use PPC Advertising to Grow Your Pool Business! How to Use PPC Advertising to Grow Your Pool Business! Welcome From print materials to online marketing, there is no shortage of ways to spend your marketing budget. And whether your annual budget is $1000

More information

1. Open Excel and ensure F9 is attached - there should be a F9 pull-down menu between Window and Help in the Excel menu list like this:

1. Open Excel and ensure F9 is attached - there should be a F9 pull-down menu between Window and Help in the Excel menu list like this: This is a short tutorial designed to familiarize you with the basic concepts of creating a financial report with F9. Every F9 financial report starts as a spreadsheet and uses the features of Microsoft

More information

Statistical Observations on Mass Appraisal. by Josh Myers Josh Myers Valuation Solutions, LLC.

Statistical Observations on Mass Appraisal. by Josh Myers Josh Myers Valuation Solutions, LLC. Statistical Observations on Mass Appraisal by Josh Myers Josh Myers Valuation Solutions, LLC. About Josh Josh Myers is an independent CAMA consultant and owner of Josh Myers Valuation Solutions, LLC. Josh

More information

Displaying Bivariate Numerical Data

Displaying Bivariate Numerical Data Price ($ 000's) OPIM 303, Managerial Statistics H Guy Williams, 2006 Displaying Bivariate Numerical Data 250.000 Price / Square Footage 200.000 150.000 100.000 50.000 - - 500 1,000 1,500 2,000 2,500 3,000

More information

Creative Commons Attribution-NonCommercial-Share Alike License

Creative Commons Attribution-NonCommercial-Share Alike License Author: Brenda Gunderson, Ph.D., 2015 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution- NonCommercial-Share Alike 3.0 Unported License:

More information

David Easley and Jon Kleinberg November 29, 2010

David Easley and Jon Kleinberg November 29, 2010 Networks: Spring 2010 Practice Final Exam David Easley and Jon Kleinberg November 29, 2010 The final exam is Friday, December 10, 2:00-4:30 PM in Barton Hall (Central section). It will be a closed-book,

More information

Spreadsheets in Education (ejsie)

Spreadsheets in Education (ejsie) Spreadsheets in Education (ejsie) Volume 2, Issue 2 2005 Article 5 Forecasting with Excel: Suggestions for Managers Scott Nadler John F. Kros East Carolina University, nadlers@mail.ecu.edu East Carolina

More information

Capability on Aggregate Processes

Capability on Aggregate Processes Capability on Aggregate Processes CVJ Systems AWD Systems Trans Axle Solutions edrive Systems The Problem Fixture 1 Fixture 2 Horizontal Mach With one machine and a couple of fixtures, it s a pretty easy

More information

Solution to Task T3.

Solution to Task T3. Solution to Task T3. Data management in Gretl. Task T3.1. Generating Gretl data files Beach umbrella rental a. Enter data into Gretl manually. File --> New data set Number of observations: 21 Structure

More information

Outliers identification and handling: an advanced econometric approach for practical data applications

Outliers identification and handling: an advanced econometric approach for practical data applications Outliers identification and handling: an advanced econometric approach for practical data applications G. Palmegiani LUISS University of Rome Rome Italy DOI: 10.1481/icasVII.2016.d24c ABSTRACT PAPER Before

More information

Example Analysis with STATA

Example Analysis with STATA Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group

More information

Introduction to Labour Economics. Professor H.J. Schuetze Economics 370. What is Labour Economics?

Introduction to Labour Economics. Professor H.J. Schuetze Economics 370. What is Labour Economics? Introduction to Labour Economics Professor H.J. Schuetze Economics 370 What is Labour Economics? Let s begin by looking at what economics is in general Study of interactions between decision makers, which

More information

WINDOWS, MINITAB, AND INFERENCE

WINDOWS, MINITAB, AND INFERENCE DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 WINDOWS, MINITAB, AND INFERENCE I. AGENDA: A. An example with a simple (but) real data set to illustrate 1. Windows 2. The importance

More information

9.7 Getting Schooled. A Solidify Understanding Task

9.7 Getting Schooled. A Solidify Understanding Task 35 9.7 Getting Schooled A Solidify Understanding Task In Getting More $, Leo and Araceli noticed a difference in men s and women s salaries. Araceli thought that it was unfair that women were paid less

More information

Midterm Exam. Friday the 29th of October, 2010

Midterm Exam. Friday the 29th of October, 2010 Midterm Exam Friday the 29th of October, 2010 Name: General Comments: This exam is closed book. However, you may use two pages, front and back, of notes and formulas. Write your answers on the exam sheets.

More information

Empirics of Airline Pricing

Empirics of Airline Pricing Empirics of Airline Pricing [Think about a interesting title that will motivate people to read your paper] [you can use this file as a template for your paper. The letters in green are comments and the

More information

Example Analysis with STATA

Example Analysis with STATA Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group

More information

Gasoline Consumption Analysis

Gasoline Consumption Analysis Gasoline Consumption Analysis One of the most basic topics in economics is the supply/demand curve. Simply put, the supply offered for sale of a commodity is directly related to its price, while the demand

More information

This paper is not to be removed from the Examination Halls

This paper is not to be removed from the Examination Halls This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104A ZB (279 004A) BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences, the

More information

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Modelling categorical variables using logit models Software commands

More information

Managerial Decision-Making Introduction To Using Excel In Forecasting

Managerial Decision-Making Introduction To Using Excel In Forecasting Managerial Decision-Making Introduction To Using Excel In May 28-31, 2012 Thomas H. Payne, Ph.D. Dunagan Chair of Excellence in Banking Chair, Department of Accounting, Finance, Economics and Political

More information

Empirical Exercise Handout

Empirical Exercise Handout Empirical Exercise Handout Ec-970 International Corporate Governance Harvard University March 2, 2004 Due Date The new due date for empirical paper is Wednesday, March 24 at the beginning of class. Description

More information

::Solutions:: Problem Set #1: Due end of class September 7, 2017

::Solutions:: Problem Set #1: Due end of class September 7, 2017 Multinationals and the Globalization of Production ::Solutions:: Problem Set #1: Due end of class September 7, 2017 You may discuss this problem set with your classmates, but everything you turn in must

More information

Final Exam Spring Bread-and-Butter Edition

Final Exam Spring Bread-and-Butter Edition Final Exam Spring 1996 Bread-and-Butter Edition An advantage of the general linear model approach or the neoclassical approach used in Judd & McClelland (1989) is the ability to generate and test complex

More information

STAT 2300: Unit 1 Learning Objectives Spring 2019

STAT 2300: Unit 1 Learning Objectives Spring 2019 STAT 2300: Unit 1 Learning Objectives Spring 2019 Unit tests are written to evaluate student comprehension, acquisition, and synthesis of these skills. The problems listed as Assigned MyStatLab Problems

More information

User Guide. Introduction. What s in this guide

User Guide. Introduction. What s in this guide User Guide TimeForce Advanced Scheduling is the affordable employee scheduling system that lets you schedule your employees via the Internet. It also gives your employees the ability to view and print

More information

Chapter 5 Notes Page 1

Chapter 5 Notes Page 1 Chapter 5 Notes Page 1 COST BEHAVIOR When dealing with costs, it helps for you to determine what drives the cost in question. A Cost Driver (also called Cost Base) is an activity that is associated with,

More information

Stata v 12 Illustration. One Way Analysis of Variance

Stata v 12 Illustration. One Way Analysis of Variance Stata v 12 Illustration Page 1. Preliminary Download anovaplot.. 2. Descriptives Graphs. 3. Descriptives Numerical 4. Assessment of Normality.. 5. Analysis of Variance Model Estimation.. 6. Tests of Equality

More information

You can find the consultant s raw data here:

You can find the consultant s raw data here: Problem Set 1 Econ 475 Spring 2014 Arik Levinson, Georgetown University 1 [Travel Cost] A US city with a vibrant tourist industry has an industrial accident (a spill ) The mayor wants to sue the company

More information

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway.

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway. Statistical Modelling for Business and Management J.E. Cairnes School of Business & Economics National University of Ireland Galway June 28 30, 2010 Graeme Hutcheson, University of Manchester Luiz Moutinho,

More information

User Manual NSD ERP SYSTEM Customers Relationship Management (CRM)

User Manual NSD ERP SYSTEM Customers Relationship Management (CRM) User Manual Customers Relationship Management (CRM) www.nsdarabia.com Copyright 2009, NSD all rights reserved Table of Contents Introduction... 5 MANAGER S DESKTOP... 5 CUSTOMER RELATIONSHIP MANAGEMENT...

More information

Women s Walkway. Problem of the Week Teacher Packet. Answer Check

Women s Walkway. Problem of the Week Teacher Packet. Answer Check Problem of the Week Teacher Packet Women s Walkway On the brick Women s Walkway from the intersection of 33rd and Chestnut to the intersection of 34th and Walnut in Philadelphia, I became fascinated with

More information

Statistics: Data Analysis and Presentation. Fr Clinic II

Statistics: Data Analysis and Presentation. Fr Clinic II Statistics: Data Analysis and Presentation Fr Clinic II Overview Tables and Graphs Populations and Samples Mean, Median, and Standard Deviation Standard Error & 95% Confidence Interval (CI) Error Bars

More information

Chapter 1 Data and Descriptive Statistics

Chapter 1 Data and Descriptive Statistics 1.1 Introduction Chapter 1 Data and Descriptive Statistics Statistics is the art and science of collecting, summarizing, analyzing and interpreting data. The field of statistics can be broadly divided

More information

LEMONADE STAND GAME INTRO TO EXCEL

LEMONADE STAND GAME INTRO TO EXCEL NAME: Date: The Urban Assembly School for Global Commerce, Mrs. Familiare, Career and Technical Education LEMONADE STAND GAME INTRO TO EXCEL Learning Target: I can create an inventory and sales spreadsheet

More information