Detecting Outliers in Exponentiated Pareto Distribution

Size: px
Start display at page:

Download "Detecting Outliers in Exponentiated Pareto Distribution"

Transcription

1 Journal of Sciences, Islamic Republic of Iran 28(3): (207) University of Tehran, ISSN Detecting Outliers in Exponentiated Pareto Distribution M. Jabbari Nooghabi * Department of Statistics, Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad, Islamic Republic of Iran Received: 30 April 206 / Revised: 26 October 206 / Accepted: 3 January 207 Abstract In this paper, we use two statistics for detecting outliers in exponentiated Pareto distribution. These statistics are the extension of the statistics for detecting outliers in exponential gamma distributions. In fact, we compare the power of our test statistics based on the simulation study identify the better test statistic for detecting outliers in exponentiated Pareto distribution. At the end, we describe an example from insurance company. Keywords: Exponentiated Pareto sample; Z statistic, Dixon s statistic; Outliers; Upper outlier. Introduction The Pareto distribution was originally used to describe the allocation of wealth among individuals, since it seemed to show rather well the way that a larger portion of the wealth of any society is owned by a smaller percentage of the people in that society. It can be shown that from a probability density function, graph of the population (), the probability or fraction of () that own a small amount of wealth per person, is high. The probability then decreases steadily as wealth increases. Also, the Pareto distribution is useful for finding the average of annuity benefit for an insurance problem. In economics, where this distribution is used as an income distribution, the threshold parameter is some minimum income with a known value. Dixit Jabbari Nooghabi [4] compared the uniformly minimum variance unbiased estimator (UMVUE) of the probability density function (pdf), the distribution function (CDF) the moment for the Pareto distribution. Now, if we assume that is a Pareto distributed rom variable, then we take ln () to have the corresponding exponentiated Pareto distribution as defined by Nadarajah [4]. Usually, is defined on the positive side of the real line so one would hope that models on the basis of the distribution of would have greater applicability. Nadarajah [4] introduced five exponentiated Pareto distributions derived several of their properties including the moment generating function, expectation, variance, skewness, kurtosis, Shannon entropy, the Rényi entropy. Note that another type of exponentiated Pareto distribution was considered by Shawky Abu-Zinadah [5] characterized using record values. Shawky Abu- Zinadah [6] derived the maximum likelihood estimation of the different parameters of an exponentiated Pareto distribution. Also, they considered five other estimation procedures compared them. Afify [2] obtained Bayes classical estimators for a two parameter exponentiated Pareto distribution for when samples are available from complete, type I type II censoring schemes. He proposed Bayes estimators under a squared error loss function as well as under a LINEX loss function using priors of noninformative type for the parameters. Mahmoud [3] proposed the best linear unbiased estimates the maximum likelihood estimates of the * Corresponding author: Tel: ; Fax: ; jabbarinm@um.ac.ir; jabbarinm@yahoo.com 267

2 Vol. 28 No. 3 Summer 207 M. Jabbari Nooghabi. J. Sci. I. R. Iran location scale parameters from the Exponentiated Pareto distribution based on progressively Type-II right censored order statistics. However, in this paper we will restrict to the form defined by Nadarajah [4]. The generalized Chauvenet s test for rejecting outlier observations is suitable for detecting outliers in a univariate data set. This test can be used for exponential case. Several authors considered the problem for testing one outlier in exponential distribution. Only two types of statistics for testing multiple outliers exist. First is Dixon s while the second is based on the ratio of some observations suspected to be outliers with respect to the sum of all observations in the sample. In fact, most of these authors have used a general case of gamma model then the results for exponential model are given as a special case. This approach is focused on alternative models, namely slippage alternatives in exponential samples (see Barnett Lewis [3]). Barnett Lewis [3] gave a survey of literature in the connection. Kale [8] investigated the problem of identifying the outliers for one parameter exponential family. Zerbet Nikulin [7] proposed a different statistic from the wellknown Dixon s statistic,, to test multiple outliers. Hadi et al. [6] presented an overview of the major developments in the area of detection of outliers. These include projection pursuit approaches as well as Mahalanobis distance-based procedures. Also, they discussed other methods, corresponding to the large datasets. Jabbari Nooghabi et. al. [7] Kumar Lalitha [] extended the Zerbet Nikulin [7] statistic for gamma distribution showed that statistic is more powerful than Dixon s. Kumar [0] discussed an approach for testing multiple upper outliers with slippage alternative in an exponential sample irrespective of origin. The test statistic is based on a ratio of two estimates, obtained by the maximization of the two log-likelihood functions. He derived the exact null distribution of the test statistic. Kornacki [9] proposed an alternative method of outlier detection based on the Akaike information criterion. Lin Balakrishnan [2] proposed an algorithm for evaluating the null joint distribution of Dixon-type test statistics for testing discordancy of upper outliers in exponential samples. Gogoi Das [5] compared the empirical powers of some statistics for detecting multiple upper outliers in exponential samples under slippage alternative. The results show that the maximum likelihood ratio test statistic is better than the other statistics followed by Dixon type test statistics to deal with upper outliers in exponential samples. Adil Irshad [] modified the Tukey s boxplot for detection of outliers when the data are skewed proposed approach to detect outliers properly. In this paper, we use two statistics for detecting outliers in exponentiated Pareto distribution. The distribution of the test based on these statistics under slippage alternatives is obtained the tables of critical values are given for various (the sample size) (the number of outliers). The power of these tests are also calculated compared. In the next section, we introduce the test statistics. In Sections 3 4, we obtain the distribution of the statistics. Section 5 used to compare the critical values the powers. In the last section, we describe an example from an insurance company. Statistical Inference Let,,..., be arbitrary independent rom variables. In this paper, we test the following hypothesis: :,,..., are iid rom variables from exponentiated Pareto distribution with parameters ( is unknown is known). Therefore, the probability density function of these samples under the null hypothesis is: (;, ). ln ( ) > 0, > 0 But under the slippage alternative,, we have (), (),..., () derive from (;, ), (), (),..., () derive from (;, ), where >, is unknown (), (),..., () denote the order statistics corresponding to the observations,,.... We suppose that the hypothesis be an important sub-hypothesis of the one saying that of observations are suspected to be outliers (for >, these observations are called upper outliers). So, is correspond to. To test, we use these statistics () () ( () () ), () () ln() () ln (). (2) Following the idea of the Chauvenet s test, we assume that the decision criterion is: h > >, where ( ) ( ) are the critical value corresponding to the significance level for statistics, respectively. The Distribution of Under Alternatives In this section, we find the distribution of the statistic according to Zerbet Nikulin [7] method. Then, 268

3 Detecting Outliers in Exponentiated Pareto Distribution the distribution of this statistic under the slippage alternative hypothesis is obtained by the following Theorem. Theorem. The distribution of the statistic under is { < } ( + ) ( ) ( + ) ( )( +)(+ +) +(+ +), 0 < <. (3) Proof. To proof see Zerbet Nikulin [7] or Jabbari Nooghabi et. al. [7] Kumar Lalitha []. Corollary. The distribution of under is obtained from Theorem by taking. The Distribution of Under Alternatives In this section, the following Theorem is used to find the distribution of under alternatives. Theorem 2. The distribution of under is as follows: { < } ( + + ) ( + ) ( ) ( )! ( )! ( )! ( )! + + [(( + )) (( +) + (+ +)) ], 0 < <. (4) Proof. Same as the Theorem, we set. (5) The characteristic function of (, ) is, (, ) () ( ) R (,,..., )(,,..., ).... (6) Then according to distribution of,,2,...,,,2,...,, we have, (, ). So, the joint density function of (, ) is (,) (, ) (2). (7) Same as Zerbet Nikulin [7], we obtain these products (8) ( ), ( 2)! ( )! (+ +), (9) ( + ) ( ). ( 2)! ( )! () Therefore (,) (, ) ( + + ) ( + ) ( ) ( )! ( )!! ( ) (), ( )! ( )! >0, >0. () (0) () So, the density distribution function of are ( + + ) ( + ) () ( ) [( +)+(+ +)], ( )! ( )! ( )! ( )! (2) 269

4 Vol. 28 No. 3 Summer 207 M. Jabbari Nooghabi. J. Sci. I. R. Iran { <} ( + + ) ( + ) ( ) ( )! ( )! ( )! ( )! + + [(( + )) (( +) +( + +)) ], > 0. (3) With substituting, the proof is complete. Corollary 2. The distribution of under is obtained from Theorem 2 by taking. Results A Simulation Example In this section, we give the critical values of statistics for the levels of significance 0.05 Table. Critical values of for Upper lower values in each cell refer to , respectively. Table 2. Critical values of for Upper lower values in each cell are correspond to , respectively. 270

5 Detecting Outliers inn Exponentiated Pareto Distribution 0., for,2,3,, such thatt 2, 5()0( (5)30 in Tables 2, respectively. The powers of statistics are comparedd in Figure for ,,2,3 sample size 0,20 respect to. Computing the power of the tests critical values are achieved using R software. Figure shows that for 0.05 a 0., the test based on is more powerful than the test basedd on in all values of,. An Actual Example In an insurance company one of services is motor insurance. A claim can be made of at least 500,000 Rials as compensation for the motor insurance. So, the threshold of the distribution of claim is i 500,000 Rials. The vehicles involved are of different cost of which some of them may be very expensive. Claim amounts varies according to the damage occurred to the vehicles. It has been observed that claims of these vehicles (expensive/severe damaged vehicle) are several times higher than normal vehicles. In this paper, we have drawn a romm sample of size 20 of the claim amounts. It is observed that such claims follow a Pareto distribution in the presence off outliers. Here, the number of outliers () iss unknown. The data of claims from the insurancee company of Iran records for the year is given bellow: , , , , , , , , , , , , , 35000, , , To convert the data to exponentiated Pareto distribution, at first we use the natural logarithm transformation. So, the converted dataa follow an exponentiated Pareto distribution. Therefore, the values of statistic are , , , , , , , , Figure. The variation of powers with respect to t. 27

6 Vol. 28 No. 3 Summer 207 M. Jabbari Nooghabi. J. Sci. I. R. Iran Figure 2. 2 Values of statistic respect to for actual example for k:0, respectively. Comparing these values with critical values in Table. correspondingg to 20, shows that the number of outlier is is its value for 0.05 or 0.. Figure 2 shows values of statistic respect to it is decreased when is increasing. Acknowledgement The author is grateful to the referees the editor for making valuable suggestions which led to an improved version of the paper. References. Adil I.H.. Irshad A.R. A Modified Approach for Detection of Outliers. Pak.j.stat.oper.res. (): 9-02 (205). 2. Afify W.M. On estimation of the exponentiated Pareto distribution under different samplee schemes. Stat. Methodol. 7 (2): ( 200). 3. Barnett V. Lewis T. Outliers inn Statistical Data. Second edn. John Wiley Sons, New York (984). 4. Dixit U.J. Jabbari Nooghabi M. Efficient estimation in the Pareto distribution. Stat. Methodol. 7 (6): (200). 5. Gogoi B. Das M.Kr. Detection of o Multiple Upper Outliers in Exponential Sample under Slippage Alternative. IARJSET 2 (8): (205). 6. Hadi A.S.., Imon A.H.M.R Werne M. Detectionn of outliers. WIREs. Comp. Stat. (): (2009). 7. Jabbari Nooghabi M., Jabbari Nooghabi H. Nasiri P. Detecting Outliers in Gamma Distribution. Comm. Statist. Theory Methods 39 (4): (200). 8. Kale B.K. Detection of Outliers. Sankhya B 38: (976). 9. Kornacki A. Detection D of outlying observations using the Akaike information criterion. Biometrical Letters 50 (2): 7-26 (203). 0. Kumar N. Test for multiple upper outliers o in an exponential sample s irrespective of origin. Statistics 47 (): (203).. Kumar N. a Lalitha S. Testing for upper outliers in Gamma sample. Comm. Statist. Theory Methods 4 (5): (202). 2. Lin Chien-tai Balakrishnan N. Tests for Multiple Outliers in an Exponential Sample. Comm. Statist. Simulation Comput. 43 (4): (204). 3. Mahmoud M.A.E., Yhiea N.M. El-Said S.M.. Estimation of o parameters for the exponentiated Pareto distribution based b on progressively type-iii right censored data. J. Egyptian Math. Soc. 24 (3): (206). 4. Nadarajah S. Exponentiated Pareto Distributions. Statistics 39 (3):( (2005). 5. Shawky A.I. Abu-Zinadah H.H. Characterizations of the Exponentiated Pareto Distribution Based on Record Values. Applied Mathematical Sciences 2 (26): (2008). 6. Shawky A..I. Abu-Zinadah H.H. Exponentiated Pareto Distribution: Different Method of Estimations. Int. J. Contemp. Math. M Sciences 4 (4): (2009). 7. Zerbet A. Nikulin M. A New Statisticc for Detecting Outliers in Exponential Case. Comm. Statist. S Theory Methods 32 ( 3): (2003). 272