(Intentional blank page) Please remove this page and make both-sided copy from the next page.

Size: px
Start display at page:

Download "(Intentional blank page) Please remove this page and make both-sided copy from the next page."

Transcription

1 (Intentional blank page) Please remove this page and make both-sided copy from the next page.

2 Statistical Data Analysis by Excel For Impact Evaluation Text 4 <DID, PSM, IV 1 > Difference in Difference (DID) Propensity Score Matching (PSM) Instrumental Variable (IV) Version 1.0 (March 10, 2013) Ryo SASAKI Ph.D. Senior Researcher International Development Center of Japan (IDCJ) Adjunct Professor St.Paul s (Rikkyo) University, Japan 1 Basic text and Advanced Test is also available. The Basic text consists of (i) average & standard deviation and (ii) dependent & independent-test. Advanced texst consists of (iii) regression analysis and (iv) structural equation modeling (SEM).

3 Data Analysis by STATA for Social Researchers 2 < Advanced Part > Table of Contents Session 0: Preparation of Excel:Installation of Analysis Tool 1 Session1: Difference in Difference (DID) 3 Session 2: Propensity Score Matching (PSM) ) 9 Session 3: Instrumental Variable (IV) ) 27 Author 43 Revision History 44 Appreciation by the author I obtained very valuable comments from Mr. Keitaro AOYAGI, Dr. Yusuke KAMIYA, Dr. Hiromitsu MUTA (Professor Emeritus, Tokyo Institute of Technology).

4 Session 0 Preparation of Excel:Installation of Analysis Tool It is necessary to install Analysis Tool in order to use Excel for statistical analysis. <In case of Excel 2010> (1) Open Excel 2010 > click File > Option (2) The following box comes up > Click Add-Ins > Analysis Toolpack > Go (3) The following box appears. Click Analysis ToolPack > OK (4) Excel will ask you would you like to install? Click Yes. Suddenly the installation will be finished. 1

5 (5) Click a tab named Data. If you can see Data Analysis, the installation was successful. <In case of Excel 2007> (1) Open 2007 > Click Windows mark > Excel Option (2) The following box comes up. Click Add-Ins > Go => Go the section (3) in the case of Excel

6 Session 1 Difference in Difference (DID) Local NGO in Tanzania conducted a training of advanced agricultural techniques for famers. Can we say that income of the farmers has increased by that training? We will estimate it by subtraction of the income before training and that after training. ( Before-after comparison). Did the training really make impact on farmers income? 3

7 [ 1 ] Imput of Data Input data as follows by Microsoft Excel. Make a folder on desktop and save the file in it. A file name should be easy to recognize or remember. (Caution: It is advised to use English rather than Japanese or other language). ID..Personal ID treatment.attended/not attended the training(attend=1, Not attended=0) FY2001 income in year 2001(Tanzanian shilling,000.) FY2005 income in year 2005(Tanzanian shilling,000.) 4

8 Type change01_05. [ 2 ] Conduct of DID Type a formula of calculation of the difference between FY 2005 and FY2001. Then copy and past until the end of data. = value of FY 2005 value of FY 2001 Copy and paste until the end. 5

9 You will see the following data. Confirm the data of change between FY2001 and FY2005 is added. Select: Data > Data Analysis > t-test: Two-Sample Assuming Equal Variances. Push OK. 6

10 At Variable 1 Range, select the date of which treatment is 1. At Variable 2 Range, select the date of which treatment is 0. You will obtain the following result. Hit OK. Average n(number) t-value (See absolute value) t 2 :Not statistically significant. t >2 :Statistically significant p-value (probability, %) p 0.05 :Not statistically significant. p < 0.05 :Statistically significant 7

11 Conclusion It is concluded that the difference of income between before and after the training is statistically significant (p < 5%). Thus, it can be judged that income of farmers is increased by the training of new agricultural techniques. The size of increase is about Tanzanian shilling 1,750 ( = ). If you make the following box plots, it is very useful. However, Excel cannot make this. You should use drawing software or other statistical packages which have draw function. (The following chart was generated by STATA). Farmer groups who did not attend the training. Farmer groups who attended the training ==> The value of subtraction of 2 from 1 indicates the pure change (=pure effect of the training). 8

12 Session 2 Propensity Score Matching (PSM) Local NGO in Nepal conducted a microfinance project for women in farm villages. Can we say that income of the women in farm villages has increased by that microfinance project? We will estimate it by matching the character of women who attended and that of women who did not attend. Has the microfinance project really make impact on income of women? 9

13 [ 1 ] Input of Data Input data as follows by Microsoft Excel. Make a folder on desktop and save the file in it. A file name should be easy to recognize or remember. (Caution: It is advised to use English rather than Japanese or other language). ID..Personal ID treatment.attended/not attended the training(attend=1, Not attended=0) age.age distance Distance (km) from a major market income.. Income (Nepal Rupee) 10

14 [ 2 ] Conduct of Propensity Score Matching If it is linear regression, we select Data > Data Analysis > Regression. However, the Y value is not continuous value but binary value (Y = 0 or 1) in this case. Thus, we cannot use Regression tool and Excel does not have a tool for Probit regression, which can handle in the case Y is binary value (Y = 0 or 1). Y = continuous value Y = 0 ~ 1 (Binary) 1 0 In this case, you can use Excel s Regression. But in this case, Excel does not have a tool called Probit Regression. So we borrow the following result of Probit Regression generated by other statistical software (e.g., SPSS, STATA, SAS, etc.) 11

15 The following is the result of probit regression by STATA. Coefficients of all variables are statistically significant (p < 10%). BOX Explanation of Probit regression By using the above calculation result of coefficients, you can construct the following equation. z-value value= * age * distance As you see the following chart, total area covered by the standardized normal distribution is 1 (=100%). Cumulative distribution value is the area under the line of standardized normal distribution and it will take from 0.00 to 1.00 (0% - 100%). The command is normsdist(z-value value,, 0, 1, 1) 1 if you use MS-Excel. The command is normprob if you use STATA. Example)Data of ID6 ( age=27 distance=3km ) Z-value = * * 3 =0.738 Cumulative distribution value= normdist (0.738, 0, 1, 1) = 0.770* Ex.)If z-value=0.738, the area covered is 77.0%. f (z) 0.4 Or, it can be drawn as follows. cumm. 77.0% 77.0 % (Left end) z=0.738 (z-value) z=0.738 (z-value) *More correctly saying, it is , and this value is very similar to the value when the STATA s command, normprob, is used. 12

16 Next, you should calculate z-value. (i) Give a name z, (ii) type the following formula, and (iii) copy & paste = * C * D2 Copy & paste. 13

17 Next, you should calculate the cummulative value of z-value, which is expected value of Treatment. (i) Give a name treatment_hat, (ii) type the following formula, and (iii) copy & paste. (Assume your F2 in the following formula is location of first z-value). = normdist (F2, 0, 1, 1) Copy & paste. 14