CREDIT RISK MODELING USING LOGISTIC RIDGE REGRESSION RAKHMAWATI
|
|
- Ella Greer
- 5 years ago
- Views:
Transcription
1 CREDIT RISK MODELI ING USING LOGISTIC RIDGE REGRESSION RAKHMAWATI DEPARTMENT OF STATISTICS FACULTY OF MATHEMATICS AND NATURAL SCIENCES BOGOR AGRICULTURAL UNIVERSITY 2111
2 ABSTRACT RAKHMAWATI. Credit Risk Modeling Using Logistic Ridge Regression. Supervised by AAM ALAMUDI and DIAN KUSUMANINGRUM. The growth of credit of national banking may cause a greater risk faced by banks. One thing we must highlight is a way to determine whether the new applicant will be good in loan repayments. A well known and widely used method for classifying the new applicant of credit is Logistic Regression. Multicollinearity is a problem that is frequently encountered in model building. Usually, variable selection method is used for handling this problem. But sometimes it creates a new problem when the important variable does not enter to the model. Logistic Ridge Regression could be an alternative in logistic regression when multicollinearity exists. The advantage of this method is that it can handle multicollinearity without deleting any predictor variables. This research compared the performance of logistic ridge regression and logistic regression with variable selection to predict the collectability status of new applicants of credit. There were 1 observations of German Credit data set. The 74 observations were used for modeling and 26 observations were used for validation. Backward was the best among other selection variable methods which had the highest c statistic and the model was fit by Hosmer and Lemeshow Goodness-of-Fit Test. By using backward logistic regression, it showed that among 17 variables there were eight variables which were significant in the wald test. There were many significant correlations among the predictors but the highest correlation coefficient was.628 which exist between duration of credit (V1) and credit amount (V2).The ridge parameter or λ was.1. The optimal cut point of backward logistic regression was.68, while for logistic ridge regression was.677. By comparing the c statistic and the total correctly predicted cases, we can see that the logistic ridge regression was better than backward logistic regression in training data. However, with testing data (validation), backward logistic regression was better. To have a better understanding of the model with higher correlation values between V1 and V2, V2* was generated to replace V2 and logistic regression with variable selection and ridge were also built. The result pointed out that logistic ridge regression has a little higher capability to predict the new applicant s collectability status than logistic regression with variable selection. Key words: credit risk modeling, logistic ridge regression, multicollinearity
3 CREDIT RISK MODELING USING LOGISTIC RIDGE REGRESSION RAKHMAWATI G Thesis as a requirement for Bachelor Degree in Statistics DEPARTMENT OF STATISTICS FACULTY OF MATHEMATICS AND NATURAL SCIENCES BOGOR AGRICULTURAL UNIVERSITY 211
4 Title : Credit Risk Modeling Using Logistic Ridge Regression Author : Rakhmawati NIM : G Approved by : Advisor I Advisor II Ir. Aam Alamudi, M.Si NIP Dian Kusumaningrum, S.Si, M,Si Acknowledged by : Head of Department Statistics Dr. Ir. Hari Wijayanto, M.Si NIP Graduation date:
5 BIOGRAPHY Rakhmawati was born in Salatiga on June, 16 th 1988 as the daughter of Paijo Nurhadi Santoso and Siti Naimah. She has a big brother named Nur Rahman Istianto and a little brother named Aulia Kharis Kurniawan. After graduating from SMA Negeri 1 Salatiga in 26, she continued her studies in Bogor Agricultural University through USMI. She took Statistics as her major in Bogor Agricultural University. She chose Information System as her Minor Subject and also some supporting courses from Department of Mathematics. She was a staff of Database and Computational Department in Gamma Sigma Beta, an organization of statistics student in Bogor Agricultural University. On the 8 th semester, she had a chance to take an internship program at PT Ganesha Cipta Informatika. There, she with her partner made a SAS program about risk management to calculate Value at Risk of market risk.
6 ACKNOWLEDGEMENTS Alhamdulillah, thanks to Allah SWT Who gives me love, opportunity, health, and capability in finalizing my research which is entitled Credit Risk Modeling Using Logistic Ridge Regression. I recognized that the completion of my research would not be done without help from other people. I want to say thanks to Mr. Aam Alamudi and also Mrs. Dian Kusumaningrum as my advisors, for their critics, ideas, and also their patience. Thanks to Miss Indah Permatasari, my internship advisor, who gave consideration then finally I got the topic of my research. For Defri Ramadhan Ismana and Yulia Triwijiwati, thank you for the discussion. Special thanks to my beloved family for the love and supports. Last, I hope this thesis would be beneficial. Bogor, February 211 Rakhmawati
7 TABLE OF CONTENTS LIST OF TABLES viii LIST OF FIGURES viii LIST OF APPENDICES viii INTRODUCTION Background 1 Objective 1 LITERATURE REVIEW Credit Risk 1 The Cramer Statistic 1 Logistic Regression 2 Logistic Ridge Regression 2 Optimal Cut Point 3 Model Evaluation 3 METHODOLOGY Data Source 4 Method 4 RESULT AND DISCUSSION Data Exploration 4 Logistic Regression with Variable Selection 6 Logistic Ridge Regression 7 Comparison of Backward and Logistic Ridge Regression 7 Comparison of Logistic Regression with Variable Selection and Ridge with Generated V2* 9 CONCLUSION 1 RECOMMENDATION 1 REFERENCE 1 APPENDIX 11 Page
8 LIST OF TABLES Page Table 1 Pearson correlation coefficient of numeric variables 5 Table 2 Spearman correlation coefficient of ordinal variables 5 Table 3 Cramer coefficient of nominal variables 5 Table 4 Comparison of backward, forward, and stepwise logistic regression 6 Table 5 Parameter estimate by using backward logistic regression 6 Table 6 Classification table of backward logistic regression by using a cut point of.68 7 Table 7 Classification table of logistic ridge regression by using a cut point of Table 8 Parameter estimate by using logistic ridge regression 7 Table 9 Comparison of c statistic between backward and logistic ridge regression 8 Table 1 Odds ratio estimate of V8 (credit history) 9 Table 11 Parameter existence on the logistic regression model with variable selection 1 LIST OF FIGURES Page Figure 1 Plot of sensitivity and specificity versus all possible cut points 3 Figure 2 Classification table 3 Figure 3 ROC curve 3 Figure 4 Plot of percentage of good debtors in each group of credit amount (V2) 4 Figure 5 Proportion of good debtors in credit history (V8) 5 Figure 6 Classification rate of backward and logistic ridge regression on each optimal cut point 8 Figure 7 Validation s classification rate of backward and logistic ridge regression on each optimal cut point 8 Figure 8 Observed collectability status on P(Y=1) by using backward logistic regression 8 Figure 9 Observed collectability status on P(Y=1) by using logistic ridge regression 8 Figure 1 Comparison of C statistic between logistic regression with variable selection and logistic ridge regression of data set with generated V2* 9 Figure 11 Comparison of total correctly predicted cases between logistic regression with variable selection and logistic ridge regression with generated V2* 9 LIST OF APPENDICES Page Appendix 1 Description of variables used in analysis 12 Appendix 2 Proportion of good debtor on each variable 14 Appendix 3 Odds ratio of backward logistic regression and logistic ridge regression 16 Appendix 4 Comparison of C statistic and the correctly predicted cases between logistic regression with variable selection and logistic ridge regression of data set with generated V2* 17
9 1 INTRODUCTION Background Credit risk is one of the eight risks that banks must consider. It is important to make a measurable, documented, and developable credit risk system. Logistic regression, discriminant analysis, and artificial neural network are some methods that are used in credit risk model. They are useful to predict whether a new applicant will become a good or bad debtor if he or she receives a loan. Multicollinearity is a common problem in credit risk modeling. Usually, the solution for this problem is using variable selection method (forward, backward, and stepwise). But this solution may cause missing information about the response variable if the deleted predictor variable is an important one. Ridge regression is another statistical procedure for dealing with the problem of multicollinearity (Ravinshanker & Dey 21). With logistic ridge regression, the multicollinearity is expected to be handled without deleting any variables and there will be no missing information from the data that has been collected. Bank of Indonesia noted that the growth of credit of national banks in January 21 was 1%. Until the end of August 21, the credit of banking industry grew and reached 2.3% (Purnomo 21a, 21b). This may conduce on a greater risk that has not been faced by banks before. Hence, it is important to build a more accurate credit scoring model to decide whether a new applicant is credible enough to get a loan. Objectives The objectives of this research are: 1. To build a credit risk model using logistic regression with variable selection and logistic ridge regression. 2. To determine the optimal probability cutpoint. 3. To compare the classification rate and the c statistic of logistic regression with variable selection and logistic ridge regression. LITERATURE REVIEW Credit Risk Model Banks loan to individuals, first by asking to fill out a loan application. The customer is asked to submit several documents that the bank needs in order to evaluate the loan request. There are six aspects of the loan application to determine whether a new applicant is creditworthy or not, The Six Basic Cs of Lending are namely character, capacity, cash, collateral, condition, and control. Character is the data about the personality. Capacity is the capacity to borrow money. Cash is related to the borrower income and balance in saving account. Collateral is the adequacy of the borrower to provide adequate support for the loan. Age and degree of specialization of the borrower's assets are the example of collateral. Condition is the prospect of business associated with economics conditions. Correctly prepared loan document is the example of control. The basic theory of credit scoring is that the bank can identify the financial, economic, and motivational factors that separate the good debtors from the bad ones by observing a large group of people who have borrowed in the past. Credit scoring systems are usually based on discriminant models or related techniques such as logit or probit models or neural networks. If the applicant s score exceeds a critical cutpoint level, he or she is more likely to be approved for credit. Among the most important variables used in evaluating consumer s loan are age, marital status, number of dependents, home ownership, telephone ownership, type of occupation, and length of employment in a current job. The Cramer Statistic The chi-square test of independence is used to conclude whether there is an association between two categorical variables. When the number of rows and columns of the contingency table are unequal, Cramer coefficient is the measure of the strength of this association. The value is between and 1. The Cramer coefficient is defined as: 1 Where X 2 is the chi-square statistic, n is the total sample size, and t is either the number of
10 2 rows or the number of columns in the contingency table, whichever is smaller. Logistic Regression Let the conditional probability that the outcome is present be denoted by P(Y=1 x)=π(x). The logit of the multiple logistic regression is given by the equation where log 1 in which case the logistic regression model is 1 When the type of independent variable is categorical, dummy variable is needed. In general, if a categorical variable (nominal or ordinal scale) has k values, then k-1 design variable will be needed. Thus, the logit for a model with p variables and the j th variable being categorical would be Maximum likelihood estimators to logit model are obtained by maximizing β of the likelihood function log 1 log 1 After getting the model, we begin the process of model assessment. The significance of the covariates could be assessed by G test statistic and Wald test. G test statistic is a likelihood ratio test and measures the significance of the parameters on the overall model. Hypothesis of G test statistic: H : β 1 = β 2 =... = β p = H 1 : at least one β i, i = 1, 2,..., p G-test Statistic could be formulated as 2 where L = Likelihood without covariates, and L p =Likelihood with p covariates. Under the null hypothesis, the distribution of G is chisquare χ 2 with p degrees of freedom. If the null hypothesis is rejected and conclude that at least one and perhaps all p coefficients are different from zero, the Wald test could be used to assess the significance of each covariate. H : β i = H 1 : β i where i = 1, 2,..., p ) β i W = ) ) SE( β i ) Under the null hypothesis, W statistic will follow a standard normal distribution (Hosmer & Lemeshow 2). Coefficient interpretation in logistic regression is by using the odds ratio that indicates how much more likely, with respect to odds, a certain event occurs in one group relative to its occurrence in another group. The odds ratio defined as exp. For numeric variable, the odds ratio indicates that for every increase of one measurement of the predictor, the risk of the outcome increases times. Multicollinearity can cause unstable estimates and inaccurate variances which affects hypothesis test (Hoerl & Kennard 198, in Shen & Gao 22). In regression, there are some approaches to handle multicollinearity, which are variable selection method (forward, backward, and stepwise) and using ridge regression. Forward selection adds terms sequentially until further additions do not improve the model. Backward elimination begins with a complex model and sequentially removes terms. Stepwise procedure starts off by choosing the equation containing the most important variable and then attempts to build up with subsequent additions of variable one at a time as long as these additions are worthwhile. Logistic Ridge Regression Unstable parameter estimates occur when the number of covariates is relatively large or when the covariates are correlated. An alternative procedure to obtain more stable estimates is to specify a restriction on the parameters. Consider the maximization of the log-likelihood function with a penalty on the norm of β: where /, the norm of the parameter vector β. The ridge parameter λ controls the amount of shrinkage of the norm of β. When λ= the solution will be the ordinary MLE. For a good choice of λ, the estimate is expected to be an average closer to the real value of β than the ordinary MLE, i.e. MSE( ) < MSE( ) (Cessie & Houwelingen 199). The estimate parameter of logistic ridge regression is calculated in the following ways:
11 3 1. Fit the logistic regression model using maximum likelihood, leading to the estimate of. Construct standardized coefficients by defining j=1,2,,p where is the standard deviation of β in the training data for the j th predictor. 2. Construct the Pearson statistic 1 where g = the number of covariate patterns m k = the number of subjects with x=x k y k = the number of positive responses (y=1) among the m k subjects = probability that the outcome is present in x=x k This is a measure of the difference between the observed and the fitted values. 3. Define the ridge parameter (λ) 1 4. Let N Z p be the matrix of centered and scaled predictors, with Let, where N V N is 1. Let equal with the intercept omitted. Then the ridge regression estimate equals,, where and Optimal Cutpoint Optimal cutpoint for the purpose of classification can be obtained from the plot of sensitivity and specificity versus all possible cutpoints (Hosmer & Lemeshow 2). The plot can be seen in Figure 1. The optimal cutpoint is not the only criteria for deciding whether a new applicant is acceptable or not to get a loan. Although the correct classification rate is high based on the optimal cutpoint, the number of false positive should be considered because the loss caused by this error is extremely large relative to the false negative. Each bank has its own criteria for making a decision. Explanation for these errors can be seen in the next session. In this research, the cutpoint score will just be attained from the plot of sensitivity and specificity versus all possible cutpoints. Figure 1 Plot of sensitivity and specificity versus all possible cutpoints Model Evaluation In model assessment, a classification table is most appropriate when classification is a stated goal of the analysis. Figure 1 is the classification table. It is a two way frequency table between actual data and the prediction. Correct classification rate (CCR) consists of percentage of true positive and true negative, while misclassification rate (MCR) consist of percentage of false positive and false negative. Predicted 1 True Negative False Positive Actual 1 (TN) False Negative (FN) Figure 2 Classification Table (FP) True Positive (TP) Sensitivity or true positive (TP) is the number of observation that have category 1 and was correctly predicted. Specificity or true negative (TN) or is the number of observation that have category and was correctly predicted. False positive is the number of observation that have category but predicted as category 1. False negative is the number of observation that have category 1 but predicted as category. Figure 3 ROC curve
12 4 Figure 3 shows ROC curve. It plots the probability of false positive (1-specificity) against true positive (sensitivity). The area under the ROC curve (AUR), which ranges from to 1, provides measure of the model ability to discriminate between those subjects who experience the outcome of interest versus those who don t. The measure of AUR is c- statistic..5 where n c : the number of concordant n d : the number of discordant t : the number of total pairs As general rule: C =.5 : no discrimination.7 C <.8 : acceptable discrimination.8 C <.9 : excellent discrimination C.9 : outstanding discrimination (Hosmer & Lemeshow 2). METHODOLOGY Data Source The data used in this research was the German Credit data set which was available at It contains observations on 1 past credit applicants. Each applicant was rated as good (7 cases) or bad (3 cases). There were 17 variables used in this research after considering The Six Basic C of Lending which consist of 3 numeric variables, 6 ordinal variables, 7 nominal variables, and 1 binary variable. Description of the variables can be seen in Appendix 1. Method Procedures used in this research were: 1. Divide the data into training data (74) for modeling and testing data (26) for validation. Each data set has the same pattern of good/bad debtors with the full data set, which comprise of 7% good debtors and 3% bad. 2. Data exploration. 3. Modeling the data by using stepwise, forward, and backward logistic regression. The probability modeled was Y=1 (the debtor had a good collectability status). Then choose one of those three models by considering the fit of the model and the model having the highest c statistic. 4. Modeling data using logistic ridge regression. 5. Determine optimal cutpoint from the intersection of sensitivity and specificity. 6. Model validation with testing data. 7. Comparing the classification rate and the c statistic between logistic ridge regression and the logistic regression with variable selection. 8. Generate V2* that had some specific correlation with V1. Then do step 3 until step 7 with new data (by replacing V2 with V2*) to see the performance of logistic regression with variable selection and logistic ridge regression as the correlation between V1 and V2* increases. RESULT AND DISCUSSION Data Exploration There were no outliers and missing values in the full data set, so all of the 1 observations were included in the analysis. Allocation of the data into modeling and validation was based on the proportion of bad and good cases of the overall data set. Each had 7% of good and 3% of bad which was appropriate with the full data set. The variables of V1 (duration of credit) and V2 (credit amount) had a decreasing trend to the response variable. Figure 4 showed that as the amount of the credit increased, the proportion of debtors with good collectability status decreased. The debtors with high installment rate (V4) tend to be bad debtors. The difference of good debtors for each occupation category was not significant. The group of debtors who were unemployed/unskilled-nonresident had the highest proportion of good debtor compared to the unskilled-resident, official, and officer. Proportion of good debtor Credit amount Figure 4 Plot of percentage of good debtors in each group of credit amount (V2) It can be seen in Appendix 2 that based on age (V3), the group of debtor aged 2 years old until 5 years old had a positive trend to the proportion of good debtor. As the age
13 5 increased the proportion of good debtors increased until the age of 5 years old. The group of debtors that were 66 until 75 years old had the lowest proportion of good debtors. Debtors with two dependents had higher proportion of good debtors than those who had one dependent (V6). As the status of checking account (V7) increased, debtors tend to be good debtors. Home ownership status (V12) also had a positive trend, the proportion of good debtors increased as the home ownership status changed from free, rent, and own. There was no pattern of the proportion of good debtors as the time of working experience in current job (V1) and the time living in their present residence (V11) increased. The figure can be seen in Appendix 2. Group of debtors that have been working four until seven years in their current job had the highest proportion of good debtors. Debtors that have been working more than seven years in their current job tend to be good debtors compared to those with less than four years of working experience in their current job. The unemployed debtors had higher proportion of good debtor than those with less than a year of working experience in their current job. For variable time of living in present residence, debtors with less than or equal to one year and also debtors with two until three years tend to be good debtors than the others. Proportion of good debtor no credit taken all credits paid back duly existing credit paid back duly Credit history delay in paying off in the past critical account Figure 5 Proportion of good debtors in credit history (V8) Figure 5 shows the credit history (V8) have a positive trend. Debtors that have not even credit taken before had the lowest proportion of good debtors. Those with high average balance in savings account (V9) tend to be good debtors. The difference between marital statuses (V14) was not significant, although the single males and married males had higher proportion of good debtor than females and divorced males. Debtors who had a guarantor (V15) tend to be good debtors than those who had a co-applicant. Those who had property (V16) tend to be good debtors than those with no property. Those having a credit purpose (V17) for used cars, furniture, or radio/television had higher proportion of good debtors than the other purposes. The lowest proportion of good debtors was those with education as the purpose of taking credit. Debtors who had a telephone number under his or her name (V19) tend to be good debtors than those who did not have telephone number under his or her name. The figure of percentage of good debtors in each variable can be seen in Appendix 2. Evaluation on the correlation between predictor variables can be seen in Table 1 and Table 2. It can be concluded that there were many significant correlations but there is only one high correlation coefficient. It exists between V1 and V2. Table 3 showed the Cramer statistic as the measure of association between the nominal variables. Table 1 Pearson correlation coefficient of numeric variables V1 V3 V V Table 2 Spearman correlation coefficient of ordinal variables V6 V1 V11 V12 V13 V V V V V12.82 Table 3 Cramer coefficient of nominal variables V8 V9 V14 V15 V16 V17 V19 V V V V V V V Between the numeric predictors, the only significant correlation occurs between V1 (duration of credit) and V2 (credit amount), with a correlation coefficient of.628 which is shown in Table 1. By using spearman coefficient of correlation shown in Table 2,.327 was the largest correlation which occurred between V11 (time in present resident) and V12 (housing). Variable V1 (time in current job) had significant correlation with all other ordinal variables except with variable V12 (home ownership).
14 6 The strength of association between nominal variables was measured by Cramer coefficient and can be seen in Table 3. Variable V17 (purpose of credit) had significant correlation with all other nominal variables. The highest correlation in nominal predictor exist between V16 (property owned) and V17 (purpose of credit), which was.218. Logistic Regression With Variable Selection Logistic regression model using forward, backward, and stepwise variable selection methods were built. Forward logistic regression gave the same result with stepwise logistic regression. Among the three selection methods, backward was the method which had the highest c statistic. By using Hosmer Lemeshow Goodness-of-Fit Test as proposed in Hosmer & Lemeshow (2), backward logistic regression model was considered fit with a p-value of.724. Table 4 Comparison of backward, forward, and stepwise logistic regression Method Hosmer and Lemeshow Goodnessof-Fit Test C statistic Chi-Square P-value Backward Forward Stepwise By using backward logistic regression, eight significant predictors were selected from 17 predictor variables, which were credit duration (V1), credit amount (V2), installment rate (V4), checking account status (V7), credit history (V8), balance in saving account (V9), marital status (V14), and the purpose of credit (V17). The parameter estimates of backward logistic regression were shown in Table 5. Variable V1 and V2 which had.628 of correlation coefficient both entered the model. It showed that the correlation of.628 was not high enough. All the parameter estimates were appropriate with the data exploration. For example V8, which had been explained above that debtors with no credits/history (the first dummy variable (V8 1 )), had the lowest proportion of good debtors. We can see from Table 5 that the parameter estimate of V8 1 was the lowest compared to the V8 2, V8 3 and V8 4. The value of the 3 rd dummy (representing debtor whose paid dully of the existing credit) was higher than 4 th dummy (representing debtor whose delay in paying off in the past) which was also appropriate with the data exploration. Table 5 Parameter estimate by using backward logistic regression Parameter Estimate SE Wald P-value Intercept <.1* V * V E * V * V * V V <.1* V <.1* V V * V * V * V * V * V V V V V * V V V V V V * V *significant at the.5 level By choosing the model that had eight significant predictors, the c statistic of backward logistic regression was.817 and it indicated that the model was an excellent classifier. From the intersection of sensitivity and specificity versus all possible cutpoints, the optimal cutpoint of backward logistic regression was equal to.68. The classification table can be seen in Table 6.
15 7 Table 6 Classification table of backward logistic regression by using a cutpoint of.68 Predicted CLASS % 25.66% Actual % 74.51% Total Correct % From 74 total cases, 551 cases (74.46%) were correctly predicted. There were 58 bad debtors (25.66%) that were predicted as the good debtors from the total of 226 bad debtors. From 514 good debtors, 131 cases (25.49%) were predicted as the bad cases by the chosen model. With the true positive rate and the true negative rate at about 74%, this model was good enough in classification and it was in line with the high c statistic that was mentioned above. Logistic Ridge Regression All the 17 variables were included in the logistic ridge regression. These variables were considered by the Six Basic Cs of Lending. The ridge parameter or λ achieved from the calculation that involved standardized parameter by ordinary logistic regression and standardized predictor variables was Table 7 Classification table of logistic ridge regression by using a cutpoint of.677 Predicted CLASS % % Actual % % Total Correct % The c statistic of this model was.832 so the model was categorized as an excellent classifier. By using optimal cutpoint of.677, the classification rate can be seen in Table 7. The total correctly predicted cases were 559 cases. There were 55 cases (24.33%) of false positive from the total of 226 bad cases. From 514 good debtors, 126 (24.51%) were predicted as bad debtors. Although the difference was not too big, there were some differences between parameter estimate achieved from the logistic ridge regression and backward logistic regression. The sign of the parameter estimate between those models were all same. But for variable V8, the parameter estimates of this model gave inappropriate result compared to the data exploration. The 1 st dummy (V8 1 ) should be lower than the 2 nd dummy (V8 2 ). Table 8 Parameter estimate by using logistic ridge regression Parameter Estimate Parameter Estimate Intercept V V V V2 -.1 V V3.116 V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V Comparison of Backward Logistic Regression and Logistic Ridge Regression From the optimal cutpoint of each model, the percentage of total correctly predicted of logistic ridge regression was better, although only six cases higher than the backward. There were 559 cases correctly predicted by using logistic ridge regression and 553 correctly predicted by using backward logistic regression. Figure 6 shows that the misclassification rates (false positive and false negative) of logistic ridge regression are lower than backward logistic regression. The correct classification rates (true positive and true negative) of logistic ridge regression were higher than of backward logistic regression. Although the logistic ridge regression was better (lower in MCR and higher in CCR), the values were not quite different.
16 8 Percentage (%) validation were higher than the backward logistic regression. However, the category of discrimination ability from both models was similar. With training data, backward and ridge were both excellent. With testing data, these two models were acceptable to discriminate the good/bad debtor. Total FP FN TP TN Backward Ridge Figure 6 Classification rate of backward and logistic ridge regression on each optimal cutpointt Figure 7 shows that in validation data set the total correctly predicted cases by using backward logistic regression differ a little bit from logistic ridge regression; it was 188 by using backward logistic regression and 185 by using logistic ridge regression from the total of 26 cases. The backward logistic regression method resulted in lower MCR and higher CCR than the logistic ridge regression. Pecentage (%) Total FP FN TP TN Backward Ridge Figure 8 Observed collectability status on P(Y= =1) by using backward logistic regression Figure 8 shows the distribution of probability of Y=1 achieved by backward logistic regression in each collectability status. The upper panel of the figure was the histogram of probability of Y=1 for the bad debtors and the lower panel was for the good debtors, as denoted by the vertical axis. The good debtors (lower panel) tend to have high probability of Y=1, there were only a few that had low probability of Y= =1. The bad debtors (upper panel) were distributed in overalll range of probability that showed the bad capability of the model in differentiating the bad debtors from the good ones. Figure 7 Validation s classification rate of backward and logistic ridge regression on each optimal cutpoint Because of the classification rates depend on only one threshold (cutpoint), so the other measure of model assessment was evaluated. It was the c statistic that measured the area under ROC curve and explained how well the model s performance. Table 9 Comparison of c statistic between backward regression and logistic ridge Model Validation Backward Ridge The c statistics of logistic ridge logistic regression with training data and also in Figure 9 Observed collectability status on P(Y= =1) by using logistic ridge regression Figure 9 shows the distribution of probability of Y=1 by logistic ridge regression. It was similar to the backward logistic regression that showed the bad debtors distribute in overall range of probability of Y=1. It represents the bad capability of the model in identifying the bad debtors.
17 9 Odds ratio is the interpretation of model coefficient in logistic regression. The odds ratio of V8 (credit history of debtor) can be seen in Table 1. Table 1 Odds ratio of V8 (credit history) Variable Backward Ridge 1 vs V8 2 vs vs vs By using backward logistic regression, debtors who had no credit tend to be a good debtor.21 times than debtors with critical account. Debtor that paid back duly all the credits at the bank tends to be a good debtor.229 times than debtor with critical account. Debtor with the existing credits paid back duly and one whose delay in paying off in the past tends to be a good debtor about one half than debtor with critical account. The last column of Table 1 was the odds ratio of V8 from the logistic ridge regression. The odds ratio of the logistic ridge regression was similar to the odds ratio of backward logistic regression. But similar to the parameter estimates, the 1 st and the 2 nd dummy variable, of logistic ridge regression showed an inconsistent result compared to the data exploration. The 1 st dummy should be lower than the 2 nd dummy. Except for the V8, all parameter estimates of the logistic ridge regression were appropriate with the data exploration. The odds ratio for the other variables can be seen in Appendix 3. Comparison of Logistic Regression with Variable Selection and Ridge with Generated V2* From the comparison of c statistic, logistic regression with variable selection and logistic ridge regression gave similar results. To know the performance of these two methods in some values of correlation coefficient, variable V2* was generated to replace V2. Variable V2* that had a correlation coefficient of.6 until.95 in increments of.5 were generated. Then the c statistics and the total correctly predicted cases were compared. The total correctly predicted cases were obtained from each model s optimal cutpoint. C statistic Figure 1 Comparison of c statistic between logistic regression with variable selection and logistic ridge regression of data set with generated V2* Total correctly predicted (%) Correlation between V1 & V2* Ridge Var. Selection Correlation between V1 & V2* Ridge Var Selection Figure 11 Comparison of total correctly predicted cases between logistic regression with variable selection and logistic ridge regression of data set with generated V2* Figure 1 and Figure 11 show that the c statistic and the total correctly predicted of logistic ridge regression was always higher than the logistic regression with variable selection in all correlation coefficient between V1 and V2*. There was no clear pattern of the difference of c statistic and total correctly predicted on these models with the increase of correlation coefficient between V1 and V2*. It may be caused by the fact that there were only two predictor variables that had high correlation coefficient. The value of the c statistic and the total correctly predicted both in modeling and validation can be seen in Appendix 4. Table 11 shows the parameter existence of each logistic regression model with variable selection method. It can be seen in the table that V11 only exist in the model with correlation coefficient of.75. It may be the cause of why the c statistic and the total correctly predicted of logistic regression with variable selection on correlation coefficient of
18 1.75 was higher than the others as shown in Figure 1 and Figure 11. Table 11 Parameter existence of the logistic regression model with variable selection Parameter Correlation coefficient between V1 & V2* V V2* V V V V7 V8 V9 V V V V V14 V15 V V17 V CONCLUSION Comparing the total correctly predicted cases with training data set logistic ridge regression was better than logistic regression with variable selection (in this case backward elimination). But when using validation data set, backward logistic regression was better. The optimal cutpoint of backward was.68 while for ridge the optimal cut point was.677. The comparison of c statistic and the total correctly predicted cases on German Credit data set show that logistic ridge regression has a little higher capability to predict the new applicant s collectability status than logistic regression with variable selection. But both models had low capability in identifying the bad debtors. RECOMMENDATION High multicollinearity in predictor variables is needed to examine the performance of logistic ridge regression. Other way to define optimal cutpoint is good to try. Usually, banks consider that the false positive affects in more losses for the bank than the false negative. Before entering variable as the predictor variable in logistic ridge regression, it is important to know whether the variable affect the response or not based on some related theories. REFERENCES Agresti A. 27. An Introduction to Categorical Data Analysis. Second Edition. New Jersey: J Wiley. Cessie S le, Houwelingan JC van Ridge estimators in logistic regression. Appl Statist 41: Daniel WW Applied Nonparametric Statistics. Second Edition. Atlanta: PWS- KENT. Gonen M. 26. Receiver Operating Characteristics (ROC) Curves. In: Statistics and Data Analysis. Proceedings of the 31 th SAS Users Group International (SUGI 31) Conference; San Francisco, Mar 26. New York: SAS Institute. page 18. Hosmer DW, Lemeshow S. 2. Applied Logistic Regression. Second Edition. New York: J Wiley. Kemeny S, Vago E. 26. Logistic ridge regression for clinical data analysis (a case study). J Appl Ecol Environ Res 4: Perlich C, Provost F, Simonoff JS. 23. Tree induction vs. logistic regression: a learning-curve analysis. J Mach Learn Res 4: Purnomo H. 21a. BI : Kredit Perbankan Tumbuh 1% di Januari /11/1325/ /5/bi-kreditperbankan-tumbuh-1-di-januari-21 [3 Jun 21]. Purnomo H. 21b. Kredit Tumbuh 2.3% Hingga Agustus. 9/3/145545/ /5/kredit-tumbuh- 23-hingga-agustus [15 Sep 21]. Reed D. 28. Mortgages 11. New York: AMACOM. Rose PS Commercial Bank Management. Fourth Edition. USA: McGraw-Hill.
19 APPENDIX
20 12 Appendix 1 Description of variables used in analysis Dummy Variable Variable Annotation Description V1 Duration of credit V2 Amount of credit V3 Age V4 Installment rate 1 : 1% 1 2 : 2% 1 3 : 3% 1 4 : 4% V6 Number of dependents 1 : 1 dependent 2 : 2 dependents 1 V7 Status of checking 1 : < DM 1 account 2: <...< 2 DM 1 3 : => 2 DM 1 4: no checking account V8 Credit history 1: no credits taken 1 2: all credits paid duly 1 3: existing credits paid duly 1 4: delay in paying off 1 5: critical account V9 Saving account balance 1 : < 1 DM 1 2 : 1<=... < 5 DM 1 3 : 5<=... < 1 DM 1 4 : =>1 DM 1 5 : no savings account V1 Time of working 1 : unemployed experience in current job 2: < 1 year 1 3 : 1 <=... < 4 years 1 4 : 4 <=... < 7 years 1 5 : >= 7 years 1 V11 Time of living in present 1: <= 1 year residence 2: 1< <=2 years 1 3: 2< <=3 years 1 4: >4years 1 V12 Home ownership 1 : owns residence 1 2 : rents 1 3 : free V13 Occupation category 1 : unemployed/ unskilled 2 : unskilled - resident 1 3 : official 1 4 : officer 1
21 13 Dummy Variable Variable Annotation Description V14 Marital Status 1 : male & divorced 1 2 : male & single 1 3 : male & married 1 4 : female V15 Guarantor 1 : has a co-applicant 1 2 : has a guarantor 1 3 : has none V16 Property owned 1 : car or other 1 2 : real estate 1 3 : no property V17 Purpose of credit 1 : new Car 1 2 : used Car 1 3 : furniture 1 4 : radio/television 1 5 : education 1 6 : retraining 1 7 : other V19 Telephone ownership 1 : no 2: yes 1 Y Good credit rating : no, 1: yes
22 14 Appendix 2 Proportion of good debtor on each variable Percentage of good debtor Proportion of good debtor Duration of credit (month) Number of dependents Percentage of good debtor Proportion of good debtor Amount of credit (DM) < DM (,2) DM >= 2 DM no checking account Status of checking account.9 Percentage of good debtor Age Proportion of good debtor no credit taken all credits paid back duly existing credit paid back duly delay in paying off in the past critical account Credit history Proportion of good debtor % 2% 3% 4% Proportion of good debtor < 1 DM [1,5) DM [5,1) DM >= 1 DM no saving account Installment rate Saving account balance
23 15 Proportion of good debtor unemployed < 1 year [1,4) years [4,7) years >= 7 years Percentage of good debtor has a co applicant has a guarantor has none Time of working experience in current job Guarantor Proportion of good debtor Percentage of good debtor <= 1 (1,2] (2,3] > 4 car or other real estate no property Time of living in present residence (year) Property owned Proportion of good debtor Percentage of good debtor Owns Rents Free new car used car furniture radio/tv education retraining other Home owrnership Purpose of credit Proportion of good debtor unemployed unskilled official officer Proportion of Good Debtor Has Has none Occupation category Telephone ownership Percentage of good debtor male & divorced male & single male & married female Marital status
24 16 Appendix 3 Odds ratio of backward logistic regression and logistic ridge regression Variable Ridge Backward Variable Ridge Backward V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V
25 17 Appendix 4 Comparison of C statistic and the total correctly predicted cases of logistic regression with variable selection and logistic ridge regression of data set with generated V2* Correlation of V1 and V2* Variable Selection C Statistic Ridge Model Total correctly predicted (%) Variable Selection Ridge C Statistic Variable Selection Validation Ridge Total correctly predicted (%) Variable Selection Ridge
RESULT AND DISCUSSION
4 Figure 3 shows ROC curve. It plots the probability of false positive (1-specificity) against true positive (sensitivity). The area under the ROC curve (AUR), which ranges from to 1, provides measure
More informationGETTING STARTED WITH PROC LOGISTIC
PAPER 255-25 GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. USA Introduction Logistic Regression is an increasingly popular analytic tool. Used to predict the probability
More informationCREDIT RISK MODELLING Using SAS
Basic Modelling Concepts Advance Credit Risk Model Development Scorecard Model Development Credit Risk Regulatory Guidelines 70 HOURS Practical Learning Live Online Classroom Weekends DexLab Certified
More informationGETTING STARTED WITH PROC LOGISTIC
GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services and University of California, Berkeley Extension Division Introduction Logistic Regression is an increasingly popular analytic
More informationThe SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa
The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pages 37-64. The description of the problem can be found
More informationAdvanced Tutorials. SESUG '95 Proceedings GETTING STARTED WITH PROC LOGISTIC
GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services and University of California, Berkeley Extension Division Introduction Logistic Regression is an increasingly popular analytic
More informationGetting Started With PROC LOGISTIC
Getting Started With PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. 19229 Sonoma Hwy. PMB 264 Sonoma, California 95476 707 996 7380 SierraInfo@aol.com www.sierrainformation.com Getting
More informationBusiness Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee
Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 02 Data Mining Process Welcome to the lecture 2 of
More informationTelecommunications Churn Analysis Using Cox Regression
Telecommunications Churn Analysis Using Cox Regression Introduction As part of its efforts to increase customer loyalty and reduce churn, a telecommunications company is interested in modeling the "time
More informationModel Validation of a Credit Scorecard Using Bootstrap Method
IOSR Journal of Economics and Finance (IOSR-JEF) e-issn: 2321-5933, p-issn: 2321-5925.Volume 3, Issue 3. (Mar-Apr. 2014), PP 64-68 Model Validation of a Credit Scorecard Using Bootstrap Method Dilsha M
More informationComparative analysis on the probability of being a good payer
Comparative analysis on the probability of being a good payer V. Mihova, and V. Pavlov Citation: AIP Conference Proceedings 1895, 050006 (2017); View online: https://doi.org/10.1063/1.5007378 View Table
More informationCOMPARISON OF LOGISTIC REGRESSION MODEL AND MARS CLASSIFICATION RESULTS ON BINARY RESPONSE FOR TEKNISI AHLI BBPLK SERANG TRAINING GRADUATES STATUS
International Journal of Humanities, Religion and Social Science ISSN : 2548-5725 Volume 2, Issue 1 2017 www.doarj.org COMPARISON OF LOGISTIC REGRESSION MODEL AND MARS CLASSIFICATION RESULTS ON BINARY
More informationMODELING THE EXPERT. An Introduction to Logistic Regression The Analytics Edge
MODELING THE EXPERT An Introduction to Logistic Regression 15.071 The Analytics Edge Ask the Experts! Critical decisions are often made by people with expert knowledge Healthcare Quality Assessment Good
More informationPredictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN
Predictive Modeling using SAS Enterprise Miner and SAS/STAT : Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN 1 Overview This presentation will: Provide a brief introduction of how to set
More informationUnit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version
Unit 5 Logistic Regression Homework #7 Practice Problems SOLUTIONS Stata version Before You Begin Download STATA data set illeetvilaine.dta from the course website page, ASSIGNMENTS (Homeworks and Exams)
More informationSection A: This section deals with the profile of the respondents taken for the study.
RESULTS In this chapter we have discussed the results of this study. The study was conducted with the intention of finding out the relationship between service quality and customer satisfaction in Direct
More information= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney
Name: Intro to Statistics for the Social Sciences Lab Session: Spring, 2015, Dr. Suzanne Delaney CID Number: _ Homework #22 You have been hired as a statistical consultant by Donald who is a used car dealer
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Univariate Statistics Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved Table of Contents PAGE Creating a Data File...3 1. Creating
More informationAcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved.
AcaStat How To Guide AcaStat Software Copyright 2016, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Frequencies... 3 List Variables... 4 Descriptives... 5 Explore Means...
More information= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data
Intro to Statistics for the Social Sciences Fall, 2017, Dr. Suzanne Delaney Extra Credit Assignment Instructions: You have been hired as a statistical consultant by Donald who is a used car dealer to help
More informationLithium-Ion Battery Analysis for Reliability and Accelerated Testing Using Logistic Regression
for Reliability and Accelerated Testing Using Logistic Regression Travis A. Moebes, PhD Dyn-Corp International, LLC Houston, Texas tmoebes@nasa.gov Biography Dr. Travis Moebes has a B.S. in Mathematics
More informationCHAPTER FIVE CROSSTABS PROCEDURE
CHAPTER FIVE CROSSTABS PROCEDURE 5.0 Introduction This chapter focuses on how to compare groups when the outcome is categorical (nominal or ordinal) by using SPSS. The aim of the series of exercise is
More informationAn Application of Categorical Analysis of Variance in Nested Arrangements
International Journal of Probability and Statistics 2018, 7(3): 67-81 DOI: 10.5923/j.ijps.20180703.02 An Application of Categorical Analysis of Variance in Nested Arrangements Iwundu M. P. *, Anyanwu C.
More informationSPSS Guide Page 1 of 13
SPSS Guide Page 1 of 13 A Guide to SPSS for Public Affairs Students This is intended as a handy how-to guide for most of what you might want to do in SPSS. First, here is what a typical data set might
More informationSAS/STAT 14.1 User s Guide. Introduction to Categorical Data Analysis Procedures
SAS/STAT 14.1 User s Guide Introduction to Categorical Data Analysis Procedures This document is an individual chapter from SAS/STAT 14.1 User s Guide. The correct bibliographic citation for this manual
More informationHearst Challenge : A Framework to Learn Magazine Sales Volume
Hearst Challenge : A Framework to Learn Magazine Sales Volume Reza Omrani 1 Introduction This project is based on a 25000$ competition by the Hearst Magazines posted on www.kdnuggets.com. The goal of this
More informationA SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design
A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design Robert A. Vierkant, Terry M. Therneau, Jon L. Kosanke, James M. Naessens Mayo Clinic, Rochester, MN ABSTRACT A matched
More informationWhite Paper. AML Customer Risk Rating. Modernize customer risk rating models to meet risk governance regulatory expectations
White Paper AML Customer Risk Rating Modernize customer risk rating models to meet risk governance regulatory expectations Contents Executive Summary... 1 Comparing Heuristic Rule-Based Models to Statistical
More informationCategorical Data Analysis
Categorical Data Analysis Hsueh-Sheng Wu Center for Family and Demographic Research October 4, 200 Outline What are categorical variables? When do we need categorical data analysis? Some methods for categorical
More informationLogistic Regression, Part III: Hypothesis Testing, Comparisons to OLS
Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals heavily
More informationImproving long run model performance using Deviance statistics. Matt Goward August 2011
Improving long run model performance using Deviance statistics Matt Goward August 011 Objective of Presentation Why model stability is important Financial institutions are interested in long run model
More informationAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification Algorithms Jieming Wei Sharon Zhang Introduction Many organizations prospect for loyal supporters and donors by sending direct mail appeals. This is an
More informationCenter for Demography and Ecology
Center for Demography and Ecology University of Wisconsin-Madison A Comparative Evaluation of Selected Statistical Software for Computing Multinomial Models Nancy McDermott CDE Working Paper No. 95-01
More informationAnalysis Occupational Change and Earning of Star Hotel s Workers in Palembang Indonesia
International Journal of Scientific and Research Publications, Volume 7, Issue 8, August 2017 16 Analysis Occupational Change and Earning of Star Hotel s Workers in Palembang Indonesia Maidiana Astuti*,
More informationSession 7. Introduction to important statistical techniques for competitiveness analysis example and interpretations
ARTNeT Greater Mekong Sub-region (GMS) initiative Session 7 Introduction to important statistical techniques for competitiveness analysis example and interpretations ARTNeT Consultant Witada Anukoonwattaka,
More informationSuitability and Determinants of Agricultural Training Programs in Northern Ethiopia
Scholarly Journal of Agricultural Science Vol. 3(12), pp. 546-551 December, 2013 Available online at http:// www.scholarly-journals.com/sjas ISSN 2276-7118 2013 Scholarly-Journals Full Length Research
More informationEFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS
EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES by MICHAEL MCCANTS B.A., WINONA STATE UNIVERSITY, 2007 B.S., WINONA STATE UNIVERSITY, 2008 A THESIS submitted in partial
More informationRegression Analysis I & II
Data for this session is available in Data Regression I & II Regression Analysis I & II Quantitative Methods for Business Skander Esseghaier 1 In this session, you will learn: How to read and interpret
More informationCHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2
CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis
More informationCHAPTER 5 RESULTS AND ANALYSIS
CHAPTER 5 RESULTS AND ANALYSIS This chapter exhibits an extensive data analysis and the results of the statistical testing. Data analysis is done using factor analysis, regression analysis, reliability
More informationBiophysical and Econometric Analysis of Adoption of Soil and Water Conservation Techniques in the Semi-Arid Region of Sidi Bouzid (Central Tunisia)
Biophysical and Econometric Analysis of Adoption of Soil and Water Conservation Techniques in the Semi-Arid Region of Sidi Bouzid (Central Tunisia) 5 th EUROSOIL INTERNATIONAL CONGRESS 17-22 July 2016,
More information3 Ways to Improve Your Targeted Marketing with Analytics
3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers
More informationModule 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset.
Module 7: Multilevel Models for Binary Responses Most of the sections within this module have online quizzes for you to test your understanding. To find the quizzes: Pre-requisites Modules 1-6 Contents
More informationHarbingers of Failure: Online Appendix
Harbingers of Failure: Online Appendix Eric Anderson Northwestern University Kellogg School of Management Song Lin MIT Sloan School of Management Duncan Simester MIT Sloan School of Management Catherine
More informationBinary Classification Modeling Final Deliverable. Using Logistic Regression to Build Credit Scores. Dagny Taggart
Binary Classification Modeling Final Deliverable Using Logistic Regression to Build Credit Scores Dagny Taggart Supervised by Jennifer Lewis Priestley, Ph.D. Kennesaw State University Submitted 4/24/2015
More informationAn Application of Artificial Intelligent Neural Network and Discriminant Analyses On Credit Scoring
An Application of Artificial Intelligent Neural Network and Discriminant Analyses On Credit Scoring Alabi, M.A. 1, Issa, S 2., Afolayan, R.B 3. 1. Department of Mathematics and Statistics. Akanu Ibiam
More informationLogistic Regression for Early Warning of Economic Failure of Construction Equipment
Logistic Regression for Early Warning of Economic Failure of Construction Equipment John Hildreth, PhD and Savannah Dewitt University of North Carolina at Charlotte Charlotte, North Carolina Equipment
More informationBUS105 Statistics. Tutor Marked Assignment. Total Marks: 45; Weightage: 15%
BUS105 Statistics Tutor Marked Assignment Total Marks: 45; Weightage: 15% Objectives a) Reinforcing your learning, at home and in class b) Identifying the topics that you have problems with so that your
More informationThe impact of banner advertisement frequency on click through responses
The impact of banner advertisement frequency on click through responses Author Hussain, Rahim, Sweeney, Arthur, Sullivan Mort, Gillian Published 2007 Conference Title 2007 ANZMAC Conference Proceedings
More informationWeka Evaluation: Assessing the performance
Weka Evaluation: Assessing the performance Lab3 (in- class): 21 NOV 2016, 13:00-15:00, CHOMSKY ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning objectives
More informationIntroduction to Categorical Data Analysis Procedures (Chapter)
SAS/STAT 12.1 User s Guide Introduction to Categorical Data Analysis Procedures (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 12.1 User s Guide. The correct bibliographic
More informationPredicting Customer Purchase to Improve Bank Marketing Effectiveness
Business Analytics Using Data Mining (2017 Fall).Fianl Report Predicting Customer Purchase to Improve Bank Marketing Effectiveness Group 6 Sandy Wu Andy Hsu Wei-Zhu Chen Samantha Chien Instructor:Galit
More informationM. Zhao, C. Wohlin, N. Ohlsson and M. Xie, "A Comparison between Software Design and Code Metrics for the Prediction of Software Fault Content",
M. Zhao, C. Wohlin, N. Ohlsson and M. Xie, "A Comparison between Software Design and Code Metrics for the Prediction of Software Fault Content", Information and Software Technology, Vol. 40, No. 14, pp.
More informationA Note on Sex, Geographic Mobility, and Career Advancement. By: William T. Markham, Patrick O. Macken, Charles M. Bonjean, Judy Corder
A Note on Sex, Geographic Mobility, and Career Advancement By: William T. Markham, Patrick O. Macken, Charles M. Bonjean, Judy Corder This is a pre-copyedited, author-produced PDF of an article accepted
More informationA Comparison of Segmentation Based on Relevant Attributes and Segmentation Based on Determinant Attributes
30-10-2015 A Comparison of Segmentation Based on Relevant Attributes and Segmentation Based on Determinant Attributes Kayleigh Meister WAGENINGEN UR A Comparison of Segmentation Based on Relevant Attributes
More informationAPPLICATION OF SEASONAL ADJUSTMENT FACTORS TO SUBSEQUENT YEAR DATA. Corresponding Author
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 APPLICATION OF SEASONAL ADJUSTMENT FACTORS TO SUBSEQUENT
More informationStatistics 201 Summary of Tools and Techniques
Statistics 201 Summary of Tools and Techniques This document summarizes the many tools and techniques that you will be exposed to in STAT 201. The details of how to do these procedures is intentionally
More informationModeling the Perceptions and Challenges of the National Service Personnel in Kumasi Metropolis, Ghana
International Journal of Applied Science and Technology Vol. 5, No. 3; June 2015 Modeling the Perceptions and Challenges of the National Service Personnel in Kumasi Metropolis, Ghana Frank Osei Frimpong
More informationAn Analysis of Profit and Customer Satisfaction in Consumer Finance
CS-BIGS 2(2): 147-156 2009 CS-BIGS http://www.bentley.edu/csbigs/vol2-2/wang.pdf An Analysis of Profit and Customer Satisfaction in Consumer Finance Chamont Wang College of New Jersey, USA Mikhail Zhuravlev
More informationChapter 5 Evaluating Classification & Predictive Performance
Chapter 5 Evaluating Classification & Predictive Performance Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Why Evaluate? Multiple methods are available
More informationFactors Influencing the Choice of Management Strategy among Small-Scale Private Forest Owners in Sweden
Forests 2014, 5, 1695-1716; doi:10.3390/f5071695 Article OPEN ACCESS forests ISSN 1999-4907 www.mdpi.com/journal/forests Factors Influencing the Choice of Management Strategy among Small-Scale Private
More informationChapter 3. Database and Research Methodology
Chapter 3 Database and Research Methodology In research, the research plan needs to be cautiously designed to yield results that are as objective as realistic. It is the main part of a grant application
More informationApplying Regression Techniques For Predictive Analytics Paviya George Chemparathy
Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy AGENDA 1. Introduction 2. Use Cases 3. Popular Algorithms 4. Typical Approach 5. Case Study 2016 SAPIENT GLOBAL MARKETS
More informationBalance Scorecard Application to Predict Business Success with Logistic Regression
12 Journal of Advances in Economics and Finance, Vol. 3, No.1, February 2018 https://dx.doi.org/10.22606/jaef.2018.31002 Balance Scorecard Application to Predict Business Success with Logistic Regression
More informationPredictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data
Paper 942-2017 Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data Josephine S Akosa, Oklahoma State University ABSTRACT The most commonly reported model evaluation metric
More informationTopics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17
Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2 B. Rosner, 5/09/17 1 Outline 1. Testing for effect modification in logistic regression analyses 2. Conditional logistic
More informationPlease respond to each of the following attitude statement using the scale below:
Resp. ID: QWL Questionnaire : Part A: Personal Profile 1. Age as of last birthday. years 2. Gender 0. Male 1. Female 3. Marital status 0. Bachelor 1. Married 4. Level of education 1. Certificate 2. Diploma
More informationADVANCED DATA ANALYTICS
ADVANCED DATA ANALYTICS MBB essay by Marcel Suszka 17 AUGUSTUS 2018 PROJECTSONE De Corridor 12L 3621 ZB Breukelen MBB Essay Advanced Data Analytics Outline This essay is about a statistical research for
More informationANALYSING QUANTITATIVE DATA
9 ANALYSING QUANTITATIVE DATA Although, of course, there are other software packages that can be used for quantitative data analysis, including Microsoft Excel, SPSS is perhaps the one most commonly subscribed
More informationTRANSPORTATION PROBLEM AND VARIANTS
TRANSPORTATION PROBLEM AND VARIANTS Introduction to Lecture T: Welcome to the next exercise. I hope you enjoyed the previous exercise. S: Sure I did. It is good to learn new concepts. I am beginning to
More informationWinsor Approach in Regression Analysis. with Outlier
Applied Mathematical Sciences, Vol. 11, 2017, no. 41, 2031-2046 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2017.76214 Winsor Approach in Regression Analysis with Outlier Murih Pusparum Qasa
More informationCredit Card Marketing Classification Trees
Credit Card Marketing Classification Trees From Building Better Models With JMP Pro, Chapter 6, SAS Press (2015). Grayson, Gardner and Stephens. Used with permission. For additional information, see community.jmp.com/docs/doc-7562.
More informationSemester 2, 2015/2016
ECN 3202 APPLIED ECONOMETRICS 3. MULTIPLE REGRESSION B Mr. Sydney Armstrong Lecturer 1 The University of Guyana 1 Semester 2, 2015/2016 MODEL SPECIFICATION What happens if we omit a relevant variable?
More informationLECTURE 17: MULTIVARIABLE REGRESSIONS I
David Youngberg BSAD 210 Montgomery College LECTURE 17: MULTIVARIABLE REGRESSIONS I I. What Determines a House s Price? a. Open Data Set 6 to help us answer this question. You ll see pricing data for homes
More informationQUESTION 2 What conclusion is most correct about the Experimental Design shown here with the response in the far right column?
QUESTION 1 When a Belt Poka-Yoke's a defect out of the process entirely then she should track the activity with a robust SPC system on the characteristic of interest in the defect as an early warning system.
More informationThe Business of Coupons-Do coupons lead to repeat purchases?
Pursuit - The Journal of Undergraduate Research at the University of Tennessee Volume 5 Issue 1 Article 14 June 2014 The Business of Coupons-Do coupons lead to repeat purchases? Margaret P. Ross University
More informationUsing Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015
Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes
More informationChoosing the best statistical tests for your data and hypotheses. Dr. Christine Pereira Academic Skills Team (ASK)
Choosing the best statistical tests for your data and hypotheses Dr. Christine Pereira Academic Skills Team (ASK) ask@brunel.ac.uk Which test should I use? T-tests Correlations Regression Dr. Christine
More informationFACTORS INFLUENCING MICRO AND SMALL ENTERPRISES ACCESS TO FINANCE IN BOTSWANA
Journal of Social and Economic Policy, Vol. 12, No. 2, December 2015, pp. 65-76 FACTORS INFLUENCING MICRO AND SMALL ENTERPRISES ACCESS TO FINANCE IN BOTSWANA MALEFHO K * AND MOFFAT B ** Abstract: This
More informationGetting Started with HLM 5. For Windows
For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 About this Document... 3 1.2 Introduction to HLM... 3 1.3 Accessing HLM... 3 1.4 Getting Help with HLM... 3 Section 2: Accessing
More informationEducation and Labour Productivity in Papua New Guinea's Tuna Processing Industry. H.F. Campbell School of Economics University of Queensland.
Education and Labour Productivity in Papua New Guinea's Tuna Processing Industry H.F. Campbell School of Economics University of Queensland Abstract Wage and personal characteristics data from a sample
More informationFACTORS AFFECTING ON YOUTH PARTICIPATION AND SATISFACTION IN OCCUPATION RELATED TO AGRICULTURE
FACTORS AFFECTING ON YOUTH PARTICIPATION AND SATISFACTION IN OCCUPATION RELATED TO AGRICULTURE S.D.P. Sudarshanie (108814M) Dissertation submitted in partial fulfillment of the requirements for the degree
More informationLinear model to forecast sales from past data of Rossmann drug Store
Abstract Linear model to forecast sales from past data of Rossmann drug Store Group id: G3 Recent years, the explosive growth in data results in the need to develop new tools to process data into knowledge
More informationTHE DYNAMICS OF SKILL MISMATCHES IN THE DUTCH LABOR MARKET
THE DYNAMICS OF SKILL MISMATCHES IN THE DUTCH LABOR MARKET Wim Groot* Department of Health Sciences, Maastricht University and "Scholar" Research Centre for Education and Labor Market Department of Economics,
More informationPractical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System
r""'=~~"''''''''''''''''''''''''''''\;'=="'~''''o''''"'"''~ ~c_,,..! Practical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System Rainer Muche 1, Josef HogeP and Olaf
More informationA study of cartel stability: the Joint Executive Committee, Paper by: Robert H. Porter
A study of cartel stability: the Joint Executive Committee, 1880-1886 Paper by: Robert H. Porter Joint Executive Committee Cartels can increase profits by restricting output from competitive levels. However,
More informationThe impact of banner advertisement frequency on brand awareness
The impact of banner advertisement frequency on brand awareness Author Hussain, Rahim, Sweeney, Arthur, Sullivan Mort, Gillian Published 2007 Conference Title 2007 ANZMAC Conference Proceedings Copyright
More informationFACTORS AFFECTING JOB STRESS AMONG IT PROFESSIONALS IN APPAREL INDUSTRY: A CASE STUDY IN SRI LANKA
FACTORS AFFECTING JOB STRESS AMONG IT PROFESSIONALS IN APPAREL INDUSTRY: A CASE STUDY IN SRI LANKA W.N. Arsakularathna and S.S.N. Perera Research & Development Centre for Mathematical Modeling, Faculty
More informationOpening SPSS 6/18/2013. Lesson: Quantitative Data Analysis part -I. The Four Windows: Data Editor. The Four Windows: Output Viewer
Lesson: Quantitative Data Analysis part -I Research Methodology - COMC/CMOE/ COMT 41543 The Four Windows: Data Editor Data Editor Spreadsheet-like system for defining, entering, editing, and displaying
More informationChapter 3. Basic Statistical Concepts: II. Data Preparation and Screening. Overview. Data preparation. Data screening. Score reliability and validity
Chapter 3 Basic Statistical Concepts: II. Data Preparation and Screening To repeat what others have said, requires education; to challenge it, requires brains. Overview Mary Pettibone Poole Data preparation
More informationEvaluation next steps Lift and Costs
Evaluation next steps Lift and Costs Outline Lift and Gains charts *ROC Cost-sensitive learning Evaluation for numeric predictions 2 Application Example: Direct Marketing Paradigm Find most likely prospects
More informationStandard analysis model for monitoring compliance with wage equality between women and men in federal procurement (methodology)
Federal Department of Home Affairs FDHA Federal Office for Gender Equality FOGE Standard analysis model for monitoring compliance with wage equality between women and men in federal procurement (methodology)
More informationMultiple Regression. Dr. Tom Pierce Department of Psychology Radford University
Multiple Regression Dr. Tom Pierce Department of Psychology Radford University In the previous chapter we talked about regression as a technique for using a person s score on one variable to make a best
More informationLecture-21: Discrete Choice Modeling-II
Lecture-21: Discrete Choice Modeling-II 1 In Today s Class Review Examples of maximum likelihood estimation Various model specifications Software demonstration Other variants of discrete choice models
More informationReport for PAKDD 2007 Data Mining Competition
Report for PAKDD 2007 Data Mining Competition Li Guoliang School of Computing, National University of Singapore April, 2007 Abstract The task in PAKDD 2007 data mining competition is a cross-selling business
More informationModelling Repeat Visitation
European Regional Science Association 40 th European Congress, Barcelona 2000 Modelling Repeat Visitation Jie Zhang AKF (Institute of Local Government Studies) Nyropsgade 37 DK-1602 Copenhagen V Denmark
More informationCorrelation and Simple. Linear Regression. Scenario. Defining Correlation
Linear Regression Scenario Let s imagine that we work in a real estate business and we re attempting to understand whether there s any association between the square footage of a house and it s final selling
More informationarxiv: v1 [cs.lg] 13 Oct 2016
Bank Card Usage Prediction Exploiting Geolocation Information Martin Wistuba, Nghia Duong-Trung, Nicolas Schilling, and Lars Schmidt-Thieme arxiv:1610.03996v1 [cs.lg] 13 Oct 2016 Information Systems and
More informationCommitment and discounts in a loyalty model
Commitment and discounts in a loyalty model Martin Karvik Masteruppsats i försäkringsmatematik Master Thesis in Actuarial Mathematics Masteruppsats 2016:1 Försäkringsmatematik Juni 2016 www.math.su.se
More informationLogistic Regression with Expert Intervention
Smart Cities Symposium Prague 2016 1 Logistic Regression with Expert Intervention Pavla Pecherková and Ivan Nagy Abstract This paper deals with problem of analysis of traffic data. A traffic network has
More information