Comparison of Bootstrap-Estimated and Half Sample- Estimated Kolmogorov-Smirnov Test Statistics

Size: px
Start display at page:

Download "Comparison of Bootstrap-Estimated and Half Sample- Estimated Kolmogorov-Smirnov Test Statistics"

Transcription

1 EUROPEAN ACADEMIC RESEARCH Vol. II, Issue 8/ November 2014 ISSN Impact Factor: 3.1 (UIF) DRJI Value: 5.9 (B+) Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics OMENSALAM A. JAPAR CHITA P. EVARDONE Professor, Graduate School Midaao State Uiversity Iliga Istitute of Techology The Philippies Abstract: I testig goodess-of-fit it ivolves testig the hypothesis that idepedet ad idetically distributed radom variables, 1 2 X, X,..., X, are draw from a populatio with a specified cotiuous distributio fuctio F (, ) 0 x. Most of the time, some or all of the compoets of are ukow ad must be estimated from the sample x-values. Procedures like the bootstrap ad half-sample ca be used i estimatig these parameters. Usig the Kolmogorov-Smirov (KS) test statistics for goodess-of-fit, the bootstrap ad half sample- estimated parameters of this test were compared i terms of their efficiecy usig the Mea-Squared Error (MSE). It was foud that the bootstrap procedure is more efficiet tha the half-sample thereby resultig to a creatio of a ormality test software usig the Bootstrapped KS test statistics. Key words: goodess-of-fit test, Kolmogorov-Smirov test, measquared error, bootstrap, half-sample, specified distributio fuctio 1 Itroductio I statistics, oe ofte wishes to test if some observatios, say X1, X 2,..., X, from a ukow populatio, belogs to a 10649

2 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics populatio with a cumulative distributio fuctio Fx ( ) with parameter. This is called a goodess of fit test. It ivolves testig the hypothesis that idepedet ad idetically distributed radom variables, X1, X 2,..., X are draw from a populatio havig a cotiuous distributio fuctio F ( x, ) 0 with specified parameters. This simple hypothesis has the form H : F( x, ) F ( x, ). (1.1) 0 0 I the early 19 th cetury, Kolmogorov itroduced a distributio-free statistic, based o the empirical process, defied as: 0 ( x) F ( x) F ( x), x R. (1.2) A goodess-of-fit test statistics that is a fuctio of the empirical process ( x ) is the Kolmogorov-Smirov ( KS ) statistic D sup F ( x) F ( x), (1.3) x 0 This fuctio of the empirical process uder H 0 are asymptotically distributio-free [6, 7, 9] ad have distributios which are ot depedet o the ukow parameter. This is a desirable property of a test statistic. However, i may practical situatios, some or all of the compoets of are ukow ad thus, the composite hypothesis to be tested takes the form H : ( ) 0 F x, (1.4) 10650

3 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics where is a parametric family of desities. The asymptotic ull distributio of the estimated test statistics may deped i a complex way o the ukow parameters ad thus, are ot distributio-free. This problem of goodess-of-fit tests was preseted i the paper of Babu[1], whe the parameters were estimated. To address this problem, oparametric resamplig methods ad distributio-free procedures such as the bootstrap method, were proposed by Gombay ad Burke[4] to estimate the ukow parameters. It was show that the asymptotic behavior of the estimated empirical process ad its fuctios are similar to the specified cases for empirical process ad its fuctios respectively, ad are therefore distributio-free [6, 7, 9]. This study aims to verify ad further compare the ivestigatio o the asymptotic behavior ad efficiecy of the estimated empirical process ad its related fuctios based o the bootstrap method ad half-sample method via simulatio ad to create a ormality test software usig the Bootstrapped KS statistics. 2 Prelimiaries 2.1 Goodess-of-Fit Test Let X1, X 2,..., X be a radom sample from a cotiuous cumulative fuctio Fx ( ). The empirical distributio fuctio F ( ) x is a fuctio of X i s that are less tha or equal to x, i.e., 1 F ( x) I, x (2.1) ( xi x) i 1 where I( A ) deotes the idicator fuctio of the evet A. Equivaletly, i terms of the ordered statistics 10651

4 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics X(1) X(2)... X( ) of the radom sample X1, X 2,..., X, F ( x ) is give as 0 if X (1) x k F ( x) if X x X 1 if X ( ) x ( k ) ( k 1), k 1,..., 1 (2.2) A goodess-of-fit test is a procedure for determiig whether a sample of observatios, X1, X 2,..., X, ca be cosidered as a sample from a give specified distributio fuctio F ( x ), 0 where, x F ( ) ( ) 0 x f y dy, x, ad (2.3) f( y ) is a specified desity fuctio. 2.2 Kolmogorov-Smirov (KS) Statistic A goodess-of-fit test is a compariso of F ( x ) defied i (2.2) with F ( ) 0 x. The hypothesis (1.1) is rejected if the differece betwee F ( x ) ad F ( ) 0 x is very large. The Kolmogorov-Smirov statistic provides a meas of testig whether a set of observatios are from some completely specified cotiuous distributio, F ( ) 0 x. Kolmogorov [8] itroduced the statistic D sup F ( x) F( x, ), (2.4) x 0 ad obtaied the asymptotic result 10652

5 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics z lim ( 1) j d j z e, (2.5) j for the probability distributio of D, where d ( ) Pr ob D. (2.6) 2.3 Bootstrap ad Half-sample Method Bootstrap was first itroduced ad used by Efro i The bootstrap creates a large umber of datasets by samplig with replacemet ad computes the statistic o each of these datasets. The half-sample method is doe by samplig a size 2 without replacemet from the radom sample X1, X 2,..., X from a populatio with distributio fuctio Fx ( ). 2.4 Maximum Likelihood Estimators Let X1, X 2,..., X be a idepedet ad idetically distributed radom variables from N(, ) distributio. The maximum likelihood estimators (MLE) of the parameters are give by ˆ ˆ2 (, ), where 1 ˆ x Xi (2.7) ad 2 i 1 ˆ i 1 X x 2 i (2.8) ad ' s are the observed sample values. Xi 10653

6 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics The maximum likelihood estimators of the parameters were derived by Evardoe [7] i her paper to be ˆ ( ˆ, ˆ ) ad give as 2 x ˆ (2.9) 2 s ad 2 ˆ s. (2.10) x 2.4 Poitwise Mea-Squared Error (MSE) Let T be the value of the test statistics at the specified case ad t i be the value of the test statistics for th i method (i=1 for bootstrap ad i=2 for half-sample). The poitwise MSE was the measure computed to compare the simulatio results. It is computed as k 2 ˆ ( t ) () i Ti MSE t (2.11) k i 1 where k = o. of partitios of the test statistics. 3 Methodology The experimet was coducted by ivestigatig four sample sizes = 50, 150, 300 ad 500 from the two distributios: the ormal distributio with mea = 1.0 ad variace 2 = 1.0 ad gamma distributio with parameter = 4.0 ad = 2.0, with 1000 replicatios for each case. Two resamplig methods were used i estimatig the parameters, amely: bootstrap ad half-sample procedures. Step 1. For the specified case, a radom sample of size is geerated from a distributio N (1,1) where (, ) (1,1)

7 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics Step 2. Compute for F ( x, ) P( x 2) ad which formula is defied i (2.2). Step 3. Compute for ( x) F ( x, ) F ( x, ), at x = 2, where F (, ) 0 x is the value of the ormal cumulative distributio fuctio (CDF) at x = 2. This x-value is arbitrarily chose. Step 4. For each of the observatio x i the sample geerated, compute for F (, ) (, ) x F0 x where F (, ) 0 x is the ormal 0 CDF at x. The maximum of the value computed is the KS statistics defied i (2.4). Step 5. For bootstrap case, geerate a bootstrap sample from the radom sample obtaied i Step 1 ad compute from this bootstrap sample the maximum likelihood estimators (MLE) of the parameter which is ˆ ( ˆ, ˆ). Do step 2 to 4 usig this ew value of the parameter. Step 6. Do step 5 usig the half-sample method istead of the bootstrap. This is the half-sample case. All these eumerated steps are repeated 1000 times thus geeratig 1000 values of ( x ) ad D for the specified case, bootstrap case ad half-sample case for differet sample sizes of : 50, 150, 300 ad 500. Same procedures were adopted uder aother distributio Gam (4, 2). The graphs of the samplig distributio of these statistics were created at differet sample sizes ad two distributios. The whole simulatio procedure is show i the diagram below

8 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics Fig.3.1 Simulatio Diagram 4 Results ad Discussio The cumulative distributios of the empirical process ad the KS statistics were compared graphically, betwee the specified case ad bootstrap-case ad betwee that of the specified case ad half-sample case. The the poitwise MSE of the two distributios were computed. The results were compared for the bootstrap ad half-sample for the ormal ad gamma distributios

9 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics 4.1 Bootstrap Estimated Empirical Process ( x ) Leged: Blue (specified case); Red (Bootstrap Case) Fig.4.1 Cumulative Distributio of the Bootstrap-Estimated ( x ) agaist Specified ( x ) from Normal Distributio at Various The result i Fig.4.1 showed that the cumulative distributio of the empirical process of the specified case ad the bootstrap case is closest for =

10 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics Leged: Blue (specified case); Red (Bootstrap Case) Fig.4.2 Cumulative Distributio of the Bootstrap-Estimated ( x ) agaist Specified ( x ) from Gamma Distributio at Various Fig.4.2 showed that the cumulative distributio of the empirical process of the specified case ad the bootstrap case for Gamma distributio is ot that close for ay values of as that of the ormal distributio. 4.2Bootstrap-Estimated Kolmogorov-Smirov Statistics 10658

11 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics Leged: Blue (specified case); Red (Bootstrap Case) Fig.4.3 Cumulative Distributio of the Bootstrap-Estimated KS Statistics agaist Specified KS Statistics from Normal Distributio at Various Leged: Blue (specified case); Red (Bootstrap Case) Fig.4.4 Cumulative Distributio of the Bootstrap-Estimated KS Statistics agaist Specified KS Statistics from Gamma Distributio at Various Both for ormal ad gamma distributio, the cumulative distributio of the bootstrap-estimated ad specified-case KS statistics almost overlapped each other as show i Fig.4.3 ad Fig

12 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics Table 4.1 MSE for Bootstrap-Estimated Empirical Process ( x ) Kolmogorov-Smirov Statistics ( x KS ) Sample Size Normal Gamma Normal Gamma ad I this paper, the MSE was used as a measure of closeess of the distributios of the bootstrap-estimated statistics ad the specified oe. As show i Table 4.1 the MSE of the empirical process ( x ) are close to zero value for the two distributios, ormal ad gamma. The same was observed for the fuctio of the empirical process which is the KS or D statistics. The results i the previous graphs were cofirmed i this MSE measures. It is cofirmed that the samplig distributios of the bootstrap-estimated empirical process ad its fuctio are the same with that of the specified case. 4.3 Half Sample-Estimated Empirical Process ( x ) 10660

13 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics Leged: Blue (specified case); Red (Bootstrap Case) Fig.4.5 Cumulative Distributio of the Half Sample-Estimated ( x ) agaist Specified ( x ) from Normal Distributio at Various The result i Fig.4.5 showed that the cumulative distributio of the empirical process of the specified case ad the half-sample case is closest already for = 300. Leged: Blue (specified case); Red (Bootstrap Case) Fig.4.6 Cumulative Distributio of the Half Sample-Estimated ( x ) agaist Specified ( x ) from Gamma Distributio at Various Fig.4.6 showed that the cumulative distributio of the empirical process of the specified case ad the half-sample case for Gamma distributio is ot that close for ay values of as that of the ormal distributio show i Fig

14 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics 4.4 Half Sample-Estimated Kolmogorov-Smirov Statistics Leged: Blue (specified case); Red (Bootstrap Case) Fig.4.7 Cumulative Distributio of the Half Sample-Estimated KS Statistics agaist Specified KS Statistics from Normal Distributio at Various 10662

15 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics Leged: Blue (specified case); Red (Bootstrap Case) Fig.4.8 Cumulative Distributio of the Half Sampled-Estimated KS Statistics agaist Specified KS Statistics from Gamma Distributio at Various Graphically, as show i Fig.4.7 ad Fig.4.8 both for ormal ad gamma distributio, the cumulative distributio of the half sample-estimated ad specified-case KS statistics are very close to each other. Table 4.2 MSE for Half Sample-Estimated Empirical Process ( x ) ad Kolmogorov-Smirov Statistics ( x KS ) Sample Size Normal Gamma Normal Gamma As show i Table 4.2 the MSE of the empirical process ( x ) are close to zero value for the two distributios, ormal ad gamma. The same was observed for the fuctio of the empirical process which is the KS or D statistics. But comparig Table 4.1 ad Table 4.2 it was the KS statistics i the bootstrap-case which has the lower MSE value for all sample sizes

16 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics 4.5 Bootstrap Kolmogorov-Smirov Test for Normality Software Fig.4.9 Bootstrap Kolmogorov-Smirov Test for Normality Program A Bootstrap Kolmogorov-Smirov Test for Normality program was created that test if a sample of size is comig from a ormal distributio. The sample size of the iput data for this program is limited to specific sample sizes of 50, 100, 150, or 200. This software makes use of the bootstrap estimated KS to test the hypothesis that the distributio of the data is ormal. This program ca be ope ad ru usig ay iteret browser. 5 Coclusio The mea-squared error (MSE) was used i this study to provide a measure of compariso of the poitwise differece of the estimated empirical process ad oe of its fuctioal which is the Kolmogorov-Smirov statistics betwee the specified case ad the values obtaied usig the bootstrap ad half-sample procedures. The smaller the value of the MSE, the closer is the samplig distributio of the estimated statistics to the specified case

17 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics Usig the MSE i comparig the bootstrap ad half-sample procedure i terms of their efficiecy, it was foud that the bootstrap procedure is more efficiet tha the half-sample method. The bootstrap procedure is good i estimatig parameters of the hypothesized cotiuous distributio sice it approximates the samplig distributio of the specified case. It is recommeded for future studies that the sample size ad the umber of iteratios i the simulatio be icreased ad to improve the program for ay sample size, ad to use R, a free software. REFERENCES [1] Babu, Gutti Jogesh. Model fittig i the presece of uisace parameters. I Astroomical Data Aalysis-III (2004). Fio D. Murtagh (Ed.). Electroic Workshops i Computig (EWiC). [2] Babu, G. J. ad C. R. Rao (2003). Goodess-of-fit tests whe parameters are estimated. Sakya, 66, [3] Burke, M. D., M. Csorgo, S. Csorgo ad P. Revesz (1978). Approximatios of the Empirical Process Whe the Parameters are Estimated. The Aals of Probability. 5, pp [4] Burke, M. D. ad Gombay, E. (1988). O goodess-of-fit ad the bootstrap. Statistics ad Probability Letters, 6, [5] Durbi, J. (1973). Weak Covergece of the sample distributio fuctio whe parameters are estimated. A. Statist. 1, [6] Evas, James W.; Johso, Richard A.; Gree, David W. (1989). Two- ad three parameter Weibull goodess-of-fit tests. Res. Pap. FPL-RP-493. Madiso, WI: U.S. Departmet of Agriculture, Forest Service, Forest Products Laboratory; 27. [7] Evardoe, Chita P. (1988). Goodess-of-fit tests based o the empirical process. Caada: Uiversity of Calgary. Master s Thesis

18 Omesalam A. Japar, Chita P. Evardoe- Compariso of Bootstrap-Estimated ad Half Sample- Estimated Kolmogorov-Smirov Test Statistics [8] Khmaladze, E. V. Goodess-of-fit Problem ad Scaig Iovatio Martigales. The Aals of Statistics, Vol. 21, No. 2, (1993), [9] Lilliefors, Hubert W. (1967). O the Kolmogorov-Smirov Test for Normality with Mea ad Variace Ukow. Joural of the America Statistical Associatio. Vol. 62, No. 318, pp