Domain Driven Data Mining for Unavailability Estimation of Electrical Power Grids

Size: px
Start display at page:

Download "Domain Driven Data Mining for Unavailability Estimation of Electrical Power Grids"

Transcription

1 Domain Driven Data Mining for Unavailability Estimation of Electrical Power Grids Paulo J.L. Adeodato 1,2, Petrônio L. Braga 2, Adrian L. Arnaud 1, Germano C. Vasconcelos 1,2, Frederico Guedes 3, Hélio B. Menezes 3, Giorgio O. Limeira 3 1 NeuroTech Ltd., Av. Cais do Apolo, 222 / 8º andar, , Recife-PE, Brazil 2 Center for Informatics, Federal University of Pernambuco, Av. Professor Luís Freire s/n, Cidade Universitária, , Recife-PE, Brazil 3 Companhia Hidrelétrica do São Francisco - CHESF, St. Delmiro Gouveia, 333 Bongi, , Recife-PE, Brazil {Paulo,Adrian,Germano}@neurotech.com.br, {pjla,plb,gcv}@cin.ufpe.br, {fred,helio,giorgiol}@chesf.gov.br Abstract. In Brazil, power generating, transmitting and distributing companies operating in the regulated market are paid for their equipment availability. In case of system unavailability, the companies are financially penalized, more severely, on unplanned interruptions. This work presents a domain driven data mining approach for estimating the risk of systems unavailability based on their component equipments historical data, within one of the biggest Brazilian electric sector companies. Traditional statistical estimators are combined with the concepts of Recency, Frequency and Impact (RFI) for producing variables containing behavioral information finely tuned to the application domain. The unavailability costs are embedded in the problem modeling strategy. Logistic regression models bagged via their median score achieved Max_KS=0.341 and AUC_ROC=0.699 on the out-of-time data sample. This performance is much higher than the previous approaches attempted within the company. The system has been put in operation and will be monitored for the performance reassessment and maintenance re-planning. Keywords: Electrical power grid unavailability, Equipment unavailability penalties, Domain driven data mining, Model ensembles, Logistic regression. 1 Introduction In the 1990s, most of the Brazilian power companies went private and started operating, under concession from the government, regulated by the National Agency of Electrical Energy (ANEEL = Agência Nacional de Energia Elétrica) and inspected by the National System Operator (ONS = Operador Nacional do Sistema). The companies operating in this regulated market are paid for the service they provide and are penalized for system unavailability at the Operational Function (FUNOP = FUNção OPeracional) level [1]. Each unavailability penalty depends on the value of

2 the FUNOP asset, its characteristics, the duration of the power interruption and, mainly, if the interruption had been planned or not; an unplanned unavailability costs roughly 20 times more than a planned one of the same duration [1]. The reliability of electrical power grids is already very high and under continuous improvement. Each FUNOP is composed of several equipments which implement an operational function in power generation, transmission or distribution. For preserving this high reliability profile, strict maintenance plans are periodically conducted on these equipments, with particular features for each family of equipments. In general, the maintenance plan is made according mainly to the equipment manufacturer s recommendations. That takes into account the electrical load, the temperature and other aspects to define the periodicity, the procedures and parameter monitoring and adjustments. The equipment manufacturers have carried out series of trials within their plants and also collect data from their costumers installations and apply statistical methods for defining their maintenance recommendations. However, there are many other factors interfering in the system reliability in different power grids such as the quality of the repairmen s labor, ways of loading the system etc. There is also a major aspect to be considered; as the system s quality improves, less data about risky conditions are produced. Therefore, the better the system becomes, the less data about faults will be available for statistical modeling of risky conditions. Fortunately, more data from monitoring operation in normal conditions are being collected and will be available for future modeling. Instead of using the traditional statistical modeling, this paper introduces an approach based on behavioral data. That may seem odd if one thinks of the system operating in a stable regime, under a constant fault rate. However, as the faults are very rare events, it is not possible to assure a constant fault rate and the adherence in the hypothesis test always gives at least a small difference; behavioral consolidation of data may capture variations which are important for risk estimation. The results presented here support this idea. This paper is organized in six more sections. Section 2 characterizes the unavailability problem faced by CHESF (Companhia Hidro Elétrica do São Franscisco) with the data structure available and the integration and transformation needed. Section 3 shows the modeling of the problem as a binary decision based on the maintenance plan, the creation of behavioral variables and the selection of the most relevant variables for modeling. Section 4 describes the knowledge extraction process via a bagged ensemble of logistic regression models. Section 5 presents and interprets the results achieved on a statistically independent data set. Section 6 summarizes the important contributions, the limitations of the approach and future work to be done to broaden the research. 2 Problem Characterization CHESF (Companhia Hidro Elétrica do São Franscisco) is one of the biggest Power generating company in Brazil producing 10,618 MW in 14 hydroelectric power plants and 1 thermoelectric. It also transmits this energy along an 18-thousand km long power grid [2]. Its annual revenue has reached R$ 5.64 billion (= US$ 3.15 billion) in

3 2008, with a net profit of R$ 1.43 billion. Unfortunately, the revenue losses caused by penalties for unavailabilities still remain undisclosed. CHESF s power grid has 462 FUNOPs of 7 different families with an average of 39 equipments in a total of 17.8 thousand equipments with an average age of approximately 19 years of operation. The seven different FUNOP families are: transmission lines, power transformers, reactors, capacitor banks, synchronous compensators, static compensators and isolated cables. Just before being put in operation, the equipments and FUNOPs are registered in the Asset Management System (SIGA = Sistema de Gerenciamento de Ativos). After becoming operational, the equipments have all their maintenances, planned or not, recorded in the same system (SIGA). Each unavailability, no matter the cause, is recorded in the accountability system within the Asset Management System (SIGA). Unavailabilities that occurred before of January 1, 2008 were recorded in the system (DISPON) which had no direct link to the SIGA system. These two data sources hosted in two different systems with relational databases needed to be integrated in a single data mart because they are the basis for the unavailability risk estimation system to be developed. The difference in granularity between the DISPON and SIGA databases and the consequent lack of a unique key together with the legacy systems turned this database integration into a non-trivial task. Asset registration and their maintenance records have been integrated in the SIGA system for the last two years but there were several adjustments in data imported from legacy systems for previous periods in a much longer history. The most important difficulty faced however was the integration with the DISPON system where each unavailability recorded had not been linked to an equipment maintenance action. Furthermore, DISPON had been abandoned without any data importation to the current SIGA installed only 2 years ago. So, the unavailability data were dumped from the legacy database (DISPON) and were joined to the current SIGA database to form the complete unavailability database. These integration steps alone took around 60% of the project duration, having required a lot of interactions with the IT management and electrical engineers at CHESF. The purpose of this work is to estimate the risk of occurring unavailabilities in the FUNOPs which compose the power grid at each moment. At this point, it is important to emphasize a trick made to turn the risk assessment problem into a binary decision problem for data mining. Considering that unavailabilities caused by planned maintenances are negligible in cost compared to unplanned ones (only 1:20 ratio), and that maintenance actions reset the operational status of the system to optimal, the temporal sequence of planned maintenance actions defines a frame of time intervals where the presence or absence of unplanned unavailabilities characterize the binary target for the supervised training process. This binary target definition approach will be explained in the next section, along with the creation of behavioral information.

4 3 Data Transformation The variables present in the integrated database in a relational architecture needed to be transformed into more meaningful variables concerning the binary decision problem characterized for modeling the unavailability risk assessment problem. This section explains the proposed random variables that produce the most adequate mapping from the original input space to the data mart variables. It also presents how the binary decision target was defined. 3.1 Variable Creation Behavioral data are widely used in behavior scoring for credit risk assessment [3] and other business applications. In that domain, in general, it consists of the RFM (Recency, Frequency and Monetary value) variables creation approach [4]. For systems faults at CHESF, the approach was adapted to capture the relevant sequential features implicit in each event within the FUNOP related to recency, frequency and impact along time for faults and errors (RFI approach). In this approach, the impact is measured by the duration, cost and other features related to each system component / event. This is a very important basis for systematic and automatic creation of behavioral variables, considering several time spans. Other variables inherent to the FUNOPs and related to their complexity were created, such as the amount of equipments, the families of equipments and the entropy of the equipment distribution within the FUNOP. This is a Domain Driven Data Mining approach [5] of embedding the expert s knowledge from the electrical engineering field into the decision support system. The RFI approach can be generalized to model rare events in several application domains where the impact is captured by several different metrics (to be published elsewhere). Another important aspect is that, due to the very small amount of faults per equipment, their rate of faults is defined at the equipment family level. Several ratio variables were created for measuring differences from a FUNOP to the population. So, the ratio of the average rate of faults per family of equipments within a FUNOP and the average in the whole grid form an important set of variables. At this point, it is important to highlight that, in general, equipments are not replaced or swapped in the power grid; they are simply maintained. 3.2 Proposed Model and Label Definition Considering the scarcity of data about system faults and the consequent high imprecision of the estimated distributions and fault rates, the approach adopted here was to convert a classical statistical problem into a data mining problem with the advantage of reducing the amount of limiting assumptions in the modeling process. In this approach, the temporal sequence of planned maintenance actions defines a frame of time intervals used for modeling and labeling the system condition. The label is defined as bad if there is at least one unplanned unavailability within that

5 time interval and, good otherwise. This characterizes the binary target needed for the supervised training process of the decision support system [6]. The set of all planned maintenances defines a sequence of time intervals, each of which possess a binary label and takes into account all the past history of the FUNOP and its components (behavior), as illustrated in Fig. 1. Fig. 1. Planned maintenances define a sequence of time intervals for modeling the problem as a binary decision and labeling the target. An approximation has been made in the approach depicted above, considering the negligible cost of the planned unavailabilities compared to the un-planned ones (1:20 ratio) and the fact that planned unavailabilities may be produced during a planned maintenance itself. Therefore, planned unavailabilities were discarded from the training data for the modeling process. No other constraint has been made concerning data distribution types or their parameters, different from the statistical approaches. The goal of this modeling approach is to take preventive actions whenever a 'bad' prediction is made within a time interval. Despite not being in the long term maintenance plan, this short term planned maintenance action produces either negligible penalty (1:20 of the fault unavailabity penalty) or no penalty at all (several preventive maintenance actions do not cause unavailability). 3.3 Variables Selection As the process of systematic creation of behavioral variables makes it very easy to automatically produce new variables, variable selection is needed to preserve only the most meaningful and discriminative variables. The selection process was based on an approach for maximizing the information gain of the input variables in relation to the binary target and, simultaneously, minimizing the similarity (redundancy) among the input variables selected, measured by appropriate metrics, in a univariate fashion. As all input variables were numerical and the target binary, the Max_KS (Kolmogorov-Smirnov) metric [7] was used for ranking the variables by their univariate discriminative importance. The redundancy among input variables was measured by linear correlation. The input variables with correlation higher than 0.9 with other variables of higher Max_KS were discarded from the model. Following this approach, only 30 among over 900 input variables were preserved. Table 1 lists the top five most relevant variables selected with their information gain measured in terms of Max_KS and AUC_ROC (Area under the ROC Curve) [8], to be explained in Sub-section 5.1.

6 Table 1. Five univariately most relevant variables selected in terms of Max_KS. Variables Max_KS AUC_ROC Hours of UnPlanned Unavailability in Last 24 Months Quantity of UnPlanned Unavailability in Last 24 Months Hours of UnPlanned Unavailability in Last 12 Months Quantity of UnPlanned Unavailability in Last 12 Months Time Since Last UnPlanned Unavailability It is clear that unplanned unavailability along the last two years of operation is the most relevant aspect for estimating the risk of unavailability before the next planned maintenance. It is interesting that the equipments age appear only in 22 nd place in the ranking with Max_KS=0.09 and AUC_ROC=0.48, suggesting that the system fault rate is indeed at the flat part of its curve. 4 Modeling Strategy 4.1 Data Sampling As the modeling strategy involves the creation of behavioral variables, there is statistical dependence among the examples, differently from typical classification problems. Therefore data division for modeling and testing the system should to be temporally disjoint in two blocks, as done in time series forecasting tasks [9] for more realistic performance assessment. The diagram in Fig. 2 below shows this division in time. Fig. 2. Data partition along time for modeling and performance assessment of the system. This data partition took into account the change in the computational environment to represent the worst case in terms of performance assessment. In the modeling set, the target class (unavailability) represents 18.1% of the examples whereas, in the testing set, it represents only 10.5% of the examples. An additional difficulty is related to the differences in the way data were recorded before and after SIGA, which not even CHESF s personnel can precisely assess. The modeling data refer to the whole period before the SIGA system was deployed while the testing data have their target defined after SIGA s deployment. The behavioral variables of the testing data, however, also capture historical information from the preceding period.

7 4.2 Logistic Regression and Model Ensemble The modeling technique chosen was logistic regression for several interesting features it possesses being the quality and understandability of the solution produced and the small amount of data required the most relevant features for this work. Logistic regression has been successfully applied to binary classification problems, particularly to credit risk assessment [3], it does not require a validation set for overfitting prevention and it presents explicitly the knowledge extracted from data in terms of statistically validated coefficients [10]. As preliminary experiments with different data samples showed a high variance in performance, it was clear that an ensemble of systems was necessary [11]. In this work, the ensemble consisting of 31 Logistic Regression models has reduced the system s variance and their median was taken as the response for each test example. This median approach had been adopted by the authors teams since 2007 in PAKDD Data Mining Competition [12] and in NN3 Time Series Forecasting Competition [13]. As already stated, the modeling technique chosen was Logistic Regression due to its explicit coefficients and for not having the need of a validation set. For training the 31 models, 50% of the examples in the modeling data set were randomly sampled without replacement. These parameters were chosen by linear experimental project [14] with the ensemble size taking the values 31, 51 and 101 and the percentage taking the values 70% 60% and 50%. 5 Experimental Metrics, Results and Interpretation 5.1 Performance Metrics As there was no criterion available yet for defining the decision threshold along the continuous output of the logistic regression ensemble, the performance assessment was carried out using two metrics for the whole decision domain (the score range): the maximum Kolmogorov-Smirnov distance (Max_KS) [7] and the Area Under the ROC Curve (AUC_ROC) [8]. The AUC_ROC metric is widely accepted for performance assessment of binary classification based on continuous output. Similar wide acceptance holds for the Max_KS within the business application domain. Differently from its original purpose as a statistical non parametric tool for measuring the adherence of cumulative distribution functions (CDF) [7], in binary decision systems, the KS maximum distance is applied for assessing the lack of adherence between the data sets from the 2 classes, having the score as independent variable. The Kolmogorov-Smirnov Curves are the difference between the CDFs of the data sets of the two classes. The higher the curve, the better the system and the point of maximum value is particularly important in performance evaluation. Another widely used tool is the Receiver Operating Characteristic Curve (ROC Curve) [8] whose plot represents the compromise between the true positive and the false positive example classifications based on a continuous output along all its possible decision threshold values (the score). The closer the ROC curve is to the upper left corner (optimum point), the better the decision system is. The focus is on

8 assessing the performance throughout the whole X-axis range by calculating the area under the ROC curve (AUC) [8]. The bigger the area, the closer the system is to the optimum decision which happens with the AUC_ROC equal to one. 5.2 Results and Interpretation Performance was assessed on the testing set which consisted of the out-of-sample data with 4,059 examples reserved for this purpose only. Fig. 3 shows the Kolmogorov- Smirnov curve with its Max_KS= Fig. 4 shows the ROC curve with its AUC_ROC= Fig. 3. Performance assessment by the Kolmogorov-Smirnov metric with Max_KS= Fig. 4. Performance assessment by Area Under the ROC Curve metric with AUC_ROC=0.699.

9 The curves are quite noisy probably because of the small amount of data in the testing set. There are only around 400 examples from the target class in this data set whose CDF is a very noisy curve (top plot in Fig. 3) whereas the non-target class ( good ) is a smooth curve. Even being noisy, the performance curves are consistent and present an improvement which will be useful, for CHESF, particularly considering that the testing set represents a worst case approximation. 6 Concluding Remarks This paper has presented a domain driven data mining approach to the problem of Operational Function unavailability in the electrical power grid of one of the biggest power companies in Brazil - CHESF. Different from statistical approaches, this innovative work has modeled the unavailability as a data mining binary decision problem with behavioral input variables. These variables were created by sliding windows of different sizes timed by the planned maintenance events which were labeled as bad when an unplanned unavailability occurred before its next planned maintenance. An important advantage of this approach compared to the statistical ones is that it does not impose any constraint on the data distributions to be modeled. The only approximation made was to consider the planned unavailability s cost negligible compared to that of an unplanned one; around 5% of the value. It should be emphasized here that there is a big difference between the concepts of approach and technique which becomes clear when the statistical technique logistic regression is used within a domain driven data mining approach for modeling the whole problem as a sequence of rare events consolidated in RFI variables which capture sequential information in terms of Recency, Frequency and Impact. The median of an ensemble of bagged logistic regression models has provided the unavailability s risk estimating score and its coefficients made explicit the most relevant variables for each suggested decision. Results of the experiments carried out on an out-of-sample test set have shown that the approach is viable for risk estimation. It attained a Max_KS=0.341 and AUC_ROC=0.699, in a worst case scenario. After this approach s validation, the testing data set has been included in the modeling data set and the system has been re-trained with the same procedure. Now, the system has just been put in operation and its performance will be monitored for the next six months when CHESF will be making pro-active maintenance based on the system predictions. Both the quality the solution and the availability of the power grid can lead to redesigning maintenance periods. Several refinements still have to be made, particularly, those referring to the revenue losses caused by the penalties for power grid unavailability. This refinement can be made by considering the losses either in the modeling process or in the postprocessing stage along with the risk estimating score produced by the decision support system. Also, the variable selection process should include multivariate techniques such as the variance inflation factor (VIF) [15].

10 References 1. ANEEL. Normative Resolution no. 270 of June 2007, 2. CHESF. Companhia Hidro Elétrica do São Franscisco West, D.: Neural network credit scoring models. Computers and Operations Research, 27, pp , (2000) 4. Jiang, T., Tuzhilin, A.: Improving Personalization Solutions through Optimal Segmentation of Customer Bases. IEEE Trans. Knowledge and Data Eng., (21) 3, pp. 1 16, (2009) 5. Cao, L.: Introduction to Domain Driven Data Mining, in Data Mining for Business Applications (eds. Cao L, et al.), pp (2008) 6. Han, J., Kamber, M.: Data Mining: Concepts and techniques. Morgan Kaufmann, San Francisco, CA, (2006) 7. Conover, W. J.: Practical Nonparametric Statistics, 3rd edition, John Wiley & Sons, NY, USA (1999) 8. Provost, F., Fawcett, T.: Robust Classification for Imprecise Environments. Machine Learning J., (42), 3, pp (2001) 9. Adya, M., Collopy, F.: How Effective are Neural Networks at Forecasting and Prediction? A Review and Evaluation, J. of Forecasting, Vol. 17, pp (1998) 10. Hilbe, J. M.: Logistic Regression Models. Chapman & Hall / CRC Press (2009) 11. Breiman, L.: Bagging predictors. Machine Learning, (24) 2, pp (1996) 12. Adeodato, P. J. L., Vasconcelos, G. C., Arnaud, A. L., Cunha, R. C. L. V., Monteiro, D. S. M., Oliveira Neto, R.: The Power of Sampling and Stacking for the PAKDD-2007 Cross-Selling Problem. Int. J. of Data Warehousing and Mining (IJDWM), 4, pp (2008) 13. Adeodato, P. J. L., Vasconcelos, G. C., Arnaud, A. L., Cunha, R. C. L. V., Monteiro, D. S. M. P..: MLP ensembles improve long term prediction accuracy over single networks. Int. J. of Forecasting (2010) (to appear) 14. Jain, R.: The Art of Computer Systems Performance Analysis Techniques for Experimental Design Measurements Simulation and Modeling, New York: John Wiley & Sons, Kutner, M., Nachtsheim, C., Neter, J.: Applied Linear Regression Models, 4th edition, McGraw-Hill / Irwin, (2004)

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of

More information

SPM 8.2. Salford Predictive Modeler

SPM 8.2. Salford Predictive Modeler SPM 8.2 Salford Predictive Modeler SPM 8.2 The SPM Salford Predictive Modeler software suite is a highly accurate and ultra-fast platform for developing predictive, descriptive, and analytical models from

More information

Alternatives to Optimize Gas Processing Operations

Alternatives to Optimize Gas Processing Operations Mark E. Roop esimulation, Inc. Houston, Texas, U.S.A. Jean Leger Enogex, Inc. Oklahoma City, Oklahoma, U.S.A. Steve Hendon esimulation, Inc. Houston, Texas, U.S.A. CHALLENGES AND REQUIREMENTS Gas processing

More information

ESTIMATION OF ENERGY CONSUMPTION FOR DSO S REVENUE RECOVERY DUE TO CONSUMERS WITH PROVEN IRREGULAR PROCEDURE

ESTIMATION OF ENERGY CONSUMPTION FOR DSO S REVENUE RECOVERY DUE TO CONSUMERS WITH PROVEN IRREGULAR PROCEDURE ESTIMATION OF ENERGY CONSUMPTION FOR DSO S REVENUE RECOVERY DUE TO CONSUMERS WITH PROVEN IRREGULAR PROCEDURE Carlos BARIONI Denis ANTONELLI Ricardo WADA Daimon Brazil Daimon Brazil Daimon Brazil barioni@daimon.com.br

More information

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models. Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable

More information

Introduction to Research

Introduction to Research Introduction to Research Arun K. Tangirala Arun K. Tangirala, IIT Madras Introduction to Research 1 Objectives To learn the following: I What is data analysis? I Types of analyses I Different types of

More information

Checking and Analysing Customers Buying Behavior with Clustering Algorithm

Checking and Analysing Customers Buying Behavior with Clustering Algorithm Pal. Jour. V.16, I.3, No.2 2017, 486-492 Copyright 2017 by Palma Journal, All Rights Reserved Available online at: http://palmajournal.org/ Checking and Analysing Customers Buying Behavior with Clustering

More information

E-Commerce Sales Prediction Using Listing Keywords

E-Commerce Sales Prediction Using Listing Keywords E-Commerce Sales Prediction Using Listing Keywords Stephanie Chen (asksteph@stanford.edu) 1 Introduction Small online retailers usually set themselves apart from brick and mortar stores, traditional brand

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

AN INVESTIGATION OF THE USE OF OPTIMIZATION TECHNIQUES IN THE OVERALL EFFICIENCY ANALYSIS OF HYDROPOWER PLANTS

AN INVESTIGATION OF THE USE OF OPTIMIZATION TECHNIQUES IN THE OVERALL EFFICIENCY ANALYSIS OF HYDROPOWER PLANTS AN INVESTIGATION OF THE USE OF OPTIMIZATION TECHNIQUES IN THE OVERALL EFFICIENCY ANALYSIS OF HYDROPOWER PLANTS E.W.Hirano Federal University of Santa Catarina Department of Mechanical Engineering ewhirano@nedip.ufsc.br

More information

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo Ensemble Modeling Toronto Data Mining Forum November 2017 Helen Ngo Agenda Introductions Why Ensemble Models? Simple & Complex ensembles Thoughts: Post-real-life Experimentation Downsides of Ensembles

More information

Software Data Analytics. Nevena Lazarević

Software Data Analytics. Nevena Lazarević Software Data Analytics Nevena Lazarević 1 Selected Literature Perspectives on Data Science for Software Engineering, 1st Edition, Tim Menzies, Laurie Williams, Thomas Zimmermann The Art and Science of

More information

Watts App: An Energy Analytics and Demand-Response Advisor Tool

Watts App: An Energy Analytics and Demand-Response Advisor Tool Watts App: An Energy Analytics and Demand-Response Advisor Tool Santiago Gonzalez, Case Western Reserve University, Electrical Engineering, SUNFEST Fellow Dr. Rahul Mangharam, Electrical and Systems Engineering

More information

Data Mining in CRM THE CRM STRATEGY

Data Mining in CRM THE CRM STRATEGY CHAPTER ONE Data Mining in CRM THE CRM STRATEGY Customers are the most important asset of an organization. There cannot be any business prospects without satisfied customers who remain loyal and develop

More information

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 02 Data Mining Process Welcome to the lecture 2 of

More information

Unravelling Airbnb Predicting Price for New Listing

Unravelling Airbnb Predicting Price for New Listing Unravelling Airbnb Predicting Price for New Listing Paridhi Choudhary H John Heinz III College Carnegie Mellon University Pittsburgh, PA 15213 paridhic@andrew.cmu.edu Aniket Jain H John Heinz III College

More information

Gene Expression Data Analysis

Gene Expression Data Analysis Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based

More information

Assistent Professor, Department of MCA, St. Mary's Group of Institutions, Guntur, Andhra Pradesh, India

Assistent Professor, Department of MCA, St. Mary's Group of Institutions, Guntur, Andhra Pradesh, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Tele Comm. Customer Data Analysis using Multi-Layer

More information

Logistic Regression for Early Warning of Economic Failure of Construction Equipment

Logistic Regression for Early Warning of Economic Failure of Construction Equipment Logistic Regression for Early Warning of Economic Failure of Construction Equipment John Hildreth, PhD and Savannah Dewitt University of North Carolina at Charlotte Charlotte, North Carolina Equipment

More information

CREDIT RISK MODELLING Using SAS

CREDIT RISK MODELLING Using SAS Basic Modelling Concepts Advance Credit Risk Model Development Scorecard Model Development Credit Risk Regulatory Guidelines 70 HOURS Practical Learning Live Online Classroom Weekends DexLab Certified

More information

PERSONALIZED INCENTIVE RECOMMENDATIONS USING ARTIFICIAL INTELLIGENCE TO OPTIMIZE YOUR INCENTIVE STRATEGY

PERSONALIZED INCENTIVE RECOMMENDATIONS USING ARTIFICIAL INTELLIGENCE TO OPTIMIZE YOUR INCENTIVE STRATEGY PERSONALIZED INCENTIVE RECOMMENDATIONS USING ARTIFICIAL INTELLIGENCE TO OPTIMIZE YOUR INCENTIVE STRATEGY CONTENTS Introduction 3 Optimizing Incentive Recommendations 4 Data Science and Incentives: Building

More information

Understanding the Drivers of Negative Electricity Price Using Decision Tree

Understanding the Drivers of Negative Electricity Price Using Decision Tree 2017 Ninth Annual IEEE Green Technologies Conference Understanding the Drivers of Negative Electricity Price Using Decision Tree José Carlos Reston Filho Ashutosh Tiwari, SMIEEE Chesta Dwivedi IDAAM Educação

More information

arxiv: v1 [cs.lg] 13 Oct 2016

arxiv: v1 [cs.lg] 13 Oct 2016 Bank Card Usage Prediction Exploiting Geolocation Information Martin Wistuba, Nghia Duong-Trung, Nicolas Schilling, and Lars Schmidt-Thieme arxiv:1610.03996v1 [cs.lg] 13 Oct 2016 Information Systems and

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

Report for PAKDD 2007 Data Mining Competition

Report for PAKDD 2007 Data Mining Competition Report for PAKDD 2007 Data Mining Competition Li Guoliang School of Computing, National University of Singapore April, 2007 Abstract The task in PAKDD 2007 data mining competition is a cross-selling business

More information

Credit Risk Models Cross-Validation Is There Any Added Value?

Credit Risk Models Cross-Validation Is There Any Added Value? Credit Risk Models Cross-Validation Is There Any Added Value? Croatian Quants Day Zagreb, June 6, 2014 Vili Krainz vili.krainz@rba.hr The views expressed during this presentation are solely those of the

More information

MODELING THE EXPERT. An Introduction to Logistic Regression The Analytics Edge

MODELING THE EXPERT. An Introduction to Logistic Regression The Analytics Edge MODELING THE EXPERT An Introduction to Logistic Regression 15.071 The Analytics Edge Ask the Experts! Critical decisions are often made by people with expert knowledge Healthcare Quality Assessment Good

More information

Data Science in a pricing process

Data Science in a pricing process Data Science in a pricing process Michaël Casalinuovo Consultant, ADDACTIS Software michael.casalinuovo@addactis.com Contents Nowadays, we live in a continuously changing market environment, Pricing has

More information

Visual tolerance analysis for engineering optimization

Visual tolerance analysis for engineering optimization Int. J. Metrol. Qual. Eng. 4, 53 6 (03) c EDP Sciences 04 DOI: 0.05/ijmqe/03056 Visual tolerance analysis for engineering optimization W. Zhou Wei,M.Moore, and F. Kussener 3, SAS Institute Co., Ltd. No.

More information

Examination of Cross Validation techniques and the biases they reduce.

Examination of Cross Validation techniques and the biases they reduce. Examination of Cross Validation techniques and the biases they reduce. Dr. Jon Starkweather, Research and Statistical Support consultant. The current article continues from last month s brief examples

More information

Bank Card Usage Prediction Exploiting Geolocation Information

Bank Card Usage Prediction Exploiting Geolocation Information Bank Card Usage Prediction Exploiting Geolocation Information Martin Wistuba, Nghia Duong-Trung, Nicolas Schilling, and Lars Schmidt-Thieme Information Systems and Machine Learning Lab University of Hildesheim

More information

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos Effective CRM Using Predictive Analytics Antonios Chorianopoulos WlLEY Contents Preface Acknowledgments xiii xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the

More information

Long-term dynamics of CA1 hippocampal place codes

Long-term dynamics of CA1 hippocampal place codes Long-term dynamics of CA1 hippocampal place codes Yaniv Ziv, Laurie D. Burns, Eric D. Cocker, Elizabeth O. Hamel, Kunal K. Ghosh, Lacey J. Kitch, Abbas El Gamal, and Mark J. Schnitzer Supplementary Fig.

More information

Application of Decision Trees in Mining High-Value Credit Card Customers

Application of Decision Trees in Mining High-Value Credit Card Customers Application of Decision Trees in Mining High-Value Credit Card Customers Jian Wang Bo Yuan Wenhuang Liu Graduate School at Shenzhen, Tsinghua University, Shenzhen 8, P.R. China E-mail: gregret24@gmail.com,

More information

AN EXPERIENCE IN DETERMINING A COST VERSUS QUALITY OF SERVICE CHARACTERISTIC IN ORDER TO DEFINE OPTIMAL INVESTMENT LEVEL

AN EXPERIENCE IN DETERMINING A COST VERSUS QUALITY OF SERVICE CHARACTERISTIC IN ORDER TO DEFINE OPTIMAL INVESTMENT LEVEL AN EXPERIENCE IN DETERMINING A COST VERSUS QUALITY OF SERVICE CHARACTERISTIC IN ORDER TO DEFINE OPTIMAL INVESTMENT LEVEL Ivo Ordonha Cyrillo Marcelo Aparecido Pelegrini Gabriel Quiroga Sinapsis Inovação

More information

A Short-Term Bus Load Forecasting System

A Short-Term Bus Load Forecasting System 2 th International Conference on Hybrid Intelligent Systems A Short-Term Bus Load Forecasting System Ricardo Menezes Salgado Institute of Exact Sciences Federal University of Alfenase Alfenas-MG, Brazil

More information

Discriminant Analysis Applications and Software Support

Discriminant Analysis Applications and Software Support Mirko Savić Dejan Brcanov Stojanka Dakić Discriminant Analysis Applications and Stware Support Article Info:, Vol. 3 (2008), No. 1, pp. 029-033 Received 12 Januar 2008 Accepted 24 April 2008 UDC 311.42:004

More information

Thus, there are two points to keep in mind when analyzing risk:

Thus, there are two points to keep in mind when analyzing risk: One-Minute Spotlight WHAT IS RISK? Uncertainty about a situation can often indicate risk, which is the possibility of loss, damage, or any other undesirable event. Most people desire low risk, which would

More information

A Statistical Comparison Of Accelerated Concrete Testing Methods

A Statistical Comparison Of Accelerated Concrete Testing Methods Journal of Applied Mathematics & Decision Sciences, 1(2), 89-1 (1997) Reprints available directly from the Editor. Printed in New Zealand. A Statistical Comparison Of Accelerated Concrete Testing Methods

More information

Churn Prediction for Game Industry Based on Cohort Classification Ensemble

Churn Prediction for Game Industry Based on Cohort Classification Ensemble Churn Prediction for Game Industry Based on Cohort Classification Ensemble Evgenii Tsymbalov 1,2 1 National Research University Higher School of Economics, Moscow, Russia 2 Webgames, Moscow, Russia etsymbalov@gmail.com

More information

CORPORATE FINANCIAL DISTRESS PREDICTION OF SLOVAK COMPANIES: Z-SCORE MODELS VS. ALTERNATIVES

CORPORATE FINANCIAL DISTRESS PREDICTION OF SLOVAK COMPANIES: Z-SCORE MODELS VS. ALTERNATIVES CORPORATE FINANCIAL DISTRESS PREDICTION OF SLOVAK COMPANIES: Z-SCORE MODELS VS. ALTERNATIVES PAVOL KRÁL, MILOŠ FLEISCHER, MÁRIA STACHOVÁ, GABRIELA NEDELOVÁ Matej Bel Univeristy in Banská Bystrica, Faculty

More information

Application of Machine Learning to Financial Trading

Application of Machine Learning to Financial Trading Application of Machine Learning to Financial Trading January 2, 2015 Some slides borrowed from: Andrew Moore s lectures, Yaser Abu Mustafa s lectures About Us Our Goal : To use advanced mathematical and

More information

Asian Economic and Financial Review COMPUTER SIMULATION AND PLANNING OF THE COMPANY PROFITABILITY. Meri Boshkoska. Milco Prisaganec.

Asian Economic and Financial Review COMPUTER SIMULATION AND PLANNING OF THE COMPANY PROFITABILITY. Meri Boshkoska. Milco Prisaganec. Asian Economic and Financial Review journal homepage: http://aessweb.com/journal-detail.php?id=5002 COMPUTER SIMULATION AND PLANNING OF THE COMPANY PROFITABILITY Meri Boshkoska Faculty of Administration

More information

Comparison of Efficient Seasonal Indexes

Comparison of Efficient Seasonal Indexes JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 8(2), 87 105 Copyright c 2004, Lawrence Erlbaum Associates, Inc. Comparison of Efficient Seasonal Indexes PETER T. ITTIG Management Science and Information

More information

An Assessment of Deforestation Models for Reducing Emissions from Deforestation & forest Degradation (REDD)

An Assessment of Deforestation Models for Reducing Emissions from Deforestation & forest Degradation (REDD) An Assessment of Deforestation Models for Reducing Emissions from Deforestation & forest Degradation (REDD) A Case Study of Chiquitanía, Bolivia Oh Seok Kim Department of Geography, University of Southern

More information

Justifying Simulation. Why use simulation? Accurate Depiction of Reality. Insightful system evaluations

Justifying Simulation. Why use simulation? Accurate Depiction of Reality. Insightful system evaluations Why use simulation? Accurate Depiction of Reality Anyone can perform a simple analysis manually. However, as the complexity of the analysis increases, so does the need to employ computer-based tools. While

More information

A PARTIAL RELATIONSHIP BETWEEN COSTS AND QUALITY AS A BASIS FOR SETTING REGULATION PARAMETERS OF SUPPLY CONTINUITY

A PARTIAL RELATIONSHIP BETWEEN COSTS AND QUALITY AS A BASIS FOR SETTING REGULATION PARAMETERS OF SUPPLY CONTINUITY A PARTIAL RELATIONSHIP BETWEEN COSTS AND QUALITY AS A BASIS FOR SETTING REGULATION PARAMETERS OF SUPPLY CONTINUITY Petr SKALA Václav DĚTŘICH Jan ŠEFRÁNEK EGÚ Brno, a.s. Czech Republic EGÚ Brno, a.s. Czech

More information

Describing DSTs Analytics techniques

Describing DSTs Analytics techniques Describing DSTs Analytics techniques This document presents more detailed notes on the DST process and Analytics techniques 23/03/2015 1 SEAMS Copyright The contents of this document are subject to copyright

More information

PREDICTION OF CRM USING REGRESSION MODELLING

PREDICTION OF CRM USING REGRESSION MODELLING PREDICTION OF CRM USING REGRESSION MODELLING Aroushi Sharma #1, Ayush Gandhi #2, Anupam Kumar #3 #1, 2 Students, Dept. of Computer Science, MAIT, GGSIP University, Delhi, INDIA #3 Assisstant Prof., Dept.

More information

{saharonr, lastgift>35

{saharonr, lastgift>35 KDD-Cup 99 : Knowledge Discovery In a Charitable Organization s Donor Database Saharon Rosset and Aron Inger Amdocs (Israel) Ltd. 8 Hapnina St. Raanana, Israel, 43000 {saharonr, aroni}@amdocs.com 1. INTRODUCTION

More information

CHAPTER 6 DYNAMIC SERVICE LEVEL AGREEMENT FOR GRID RESOURCE ALLOCATION

CHAPTER 6 DYNAMIC SERVICE LEVEL AGREEMENT FOR GRID RESOURCE ALLOCATION 158 CHAPTER 6 DYNAMIC SERVICE LEVEL AGREEMENT FOR GRID RESOURCE ALLOCATION 6.1 INTRODUCTION In a dynamic and heterogeneous Grid environment providing guaranteed quality of service for user s job is fundamentally

More information

REVIEW OF POWER SYSTEM EXPANSION PLANNING IN VIETNAM

REVIEW OF POWER SYSTEM EXPANSION PLANNING IN VIETNAM Public Disclosure Authorized Public Disclosure Authorized Public Disclosure Authorized Public Disclosure Authorized REVIEW OF POWER SYSTEM EXPANSION PLANNING IN VIETNAM Tasks 1 and 2 Report Prepared for

More information

An empirical machine learning method for predicting potential fire control locations for pre-fire planning and operational fire management

An empirical machine learning method for predicting potential fire control locations for pre-fire planning and operational fire management International Journal of Wildland Fire 2017, 26, 587 597 IAWF 2017 Supplementary material An empirical machine learning method for predicting potential fire control locations for pre-fire planning and

More information

Logistic Regression and Decision Trees

Logistic Regression and Decision Trees Logistic Regression and Decision Trees Reminder: Regression We want to find a hypothesis that explains the behavior of a continuous y y = B0 + B1x1 + + Bpxp+ ε Source Regression for binary outcomes Regression

More information

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh Statistic Methods in in Mining Business Understanding Understanding Preparation Deployment Modelling Evaluation Mining Process (( Part 3) 3) Professor Dr. Gholamreza Nakhaeizadeh Professor Dr. Gholamreza

More information

Modeling occupancy in single person offices

Modeling occupancy in single person offices Energy and Buildings 37 (2005) 121 126 www.elsevier.com/locate/enbuild Modeling occupancy in single person offices Danni Wang a, *, Clifford C. Federspiel a, Francis Rubinstein b a Center for Environmental

More information

Managing Data to Maximize Smart Grid Benefits

Managing Data to Maximize Smart Grid Benefits Managing Data to Maximize Smart Grid Benefits CONCLUSIONS PAPER Insights from a webinar hosted by Electric Light & Power Originally broadcast in November 2011 Featuring: Chet Geschickter, Senior Analyst

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.4 Advanced analytics at your hands Today, most organizations are stuck at lower-value descriptive analytics. But more sophisticated analysis can bring great business value. TARGET APPLICATIONS Business

More information

Online Appendix Appendix 1: Data Sources

Online Appendix Appendix 1: Data Sources Development Effects of Electrification: Evidence from the Topographic Placement of Hydropower Plants in Brazil By Molly Lipscomb, A. Mushfiq Mobarak and Tania Barham Online Appendix Appendix 1: Data Sources

More information

HEMCHANDRACHARYA NORTH GUJARAT UNIVERSITY, PATAN C B C S : B.Sc. PROGRAMME. S101 :: Statistical Methods - I

HEMCHANDRACHARYA NORTH GUJARAT UNIVERSITY, PATAN C B C S : B.Sc. PROGRAMME. S101 :: Statistical Methods - I S101 :: Statistical Methods - I First Paper No. S 101 Course Name Statistical Methods - 1 Effective From June 2012 Unit Content Weitage Credit No. 1 Classification and Presentation of Data 1. Concept of

More information

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions Near-Balanced Incomplete Block Designs with An Application to Poster Competitions arxiv:1806.00034v1 [stat.ap] 31 May 2018 Xiaoyue Niu and James L. Rosenberger Department of Statistics, The Pennsylvania

More information

POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM)

POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM) OUTLINE FOR THE POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM) Module Subject Topics Learning outcomes Delivered by Exploratory & Visualization Framework Exploratory Data Collection and

More information

THERMOECONOMIC ANALYSIS OF ELECTRICITY COGENERATION FROM SUGARCANE ORIGIN.

THERMOECONOMIC ANALYSIS OF ELECTRICITY COGENERATION FROM SUGARCANE ORIGIN. Third Biomass Conference of the Americas Montreal, August 24-29, 1997, Elsevier Science Ltd. Vol. II - PP 1631-1640 THERMOECONOMIC ANALYSIS OF ELECTRICITY COGENERATION FROM SUGARCANE ORIGIN. Suani T. Coelho,

More information

Load Forecasting: Methods & Techniques. Dr. Chandrasekhar Reddy Atla

Load Forecasting: Methods & Techniques. Dr. Chandrasekhar Reddy Atla Load Forecasting: Methods & Techniques Dr. Chandrasekhar Reddy Atla About PRDC Power system Operation Power System Studies, a Time-horizon Perspective 1 year 10 years Power System Planning 1 week 1 year

More information

KDD Challenge Orange Labs R&D

KDD Challenge Orange Labs R&D KDD Challenge 2009 Orange Labs R&D Vincent Lemaire, Research & Development 03/19/2009, presentation to reading group http://perso.rd.francetelecom.fr/lemaire/ contents Oranges Labs CRM at Orange Problems

More information

Machine Learning Approaches for Flow Shop Scheduling Problems with Alternative Resources, Sequence-dependent Setup Times and Blocking

Machine Learning Approaches for Flow Shop Scheduling Problems with Alternative Resources, Sequence-dependent Setup Times and Blocking Machine Learning Approaches for Flow Shop Scheduling Problems with Alternative Resources, Sequence-dependent Setup Times and Blocking Frank Benda 1, Roland Braune 2, Karl F. Doerner 2 Richard F. Hartl

More information

THE IMPROVEMENTS TO PRESENT LOAD CURVE AND NETWORK CALCULATION

THE IMPROVEMENTS TO PRESENT LOAD CURVE AND NETWORK CALCULATION 1 THE IMPROVEMENTS TO PRESENT LOAD CURVE AND NETWORK CALCULATION Contents 1 Introduction... 2 2 Temperature effects on electricity consumption... 2 2.1 Data... 2 2.2 Preliminary estimation for delay of

More information

Dallas J. Elgin, Ph.D. IMPAQ International Randi Walters, Ph.D. Casey Family Programs APPAM Fall Research Conference

Dallas J. Elgin, Ph.D. IMPAQ International Randi Walters, Ph.D. Casey Family Programs APPAM Fall Research Conference Utilizing Predictive Modeling to Improve Policy through Improved Targeting of Agency Resources: A Case Study on Placement Instability among Foster Children Dallas J. Elgin, Ph.D. IMPAQ International Randi

More information

Strength in numbers? Modelling the impact of businesses on each other

Strength in numbers? Modelling the impact of businesses on each other Strength in numbers? Modelling the impact of businesses on each other Amir Abbas Sadeghian amirabs@stanford.edu Hakan Inan inanh@stanford.edu Andres Nötzli noetzli@stanford.edu. INTRODUCTION In many cities,

More information

A Comparative Study of Filter-based Feature Ranking Techniques

A Comparative Study of Filter-based Feature Ranking Techniques Western Kentucky University From the SelectedWorks of Dr. Huanjing Wang August, 2010 A Comparative Study of Filter-based Feature Ranking Techniques Huanjing Wang, Western Kentucky University Taghi M. Khoshgoftaar,

More information

Risk Analysis Overview

Risk Analysis Overview Risk Analysis Overview What Is Risk? Uncertainty about a situation can often indicate risk, which is the possibility of loss, damage, or any other undesirable event. Most people desire low risk, which

More information

Using Previous Knowledge for Stock Market Prediction Based on Fundamentalist Analysis with Fuzzy-Neural Networks

Using Previous Knowledge for Stock Market Prediction Based on Fundamentalist Analysis with Fuzzy-Neural Networks Using Previous Knowledge for Stock Market Prediction Based on Fundamentalist Analysis with Fuzzy-Neural Networks RENATO DE C. T. RAPOSO, ADRIANO J. DE O. CRUZ AND SUELI MENDES Núcleo de Computação Eletrônica,

More information

Top-down Forecasting Using a CRM Database Gino Rooney Tom Bauer

Top-down Forecasting Using a CRM Database Gino Rooney Tom Bauer Top-down Forecasting Using a CRM Database Gino Rooney Tom Bauer Abstract More often than not sales forecasting in modern companies is poorly implemented despite the wealth of data that is readily available

More information

Stock Price Forecasting Using Exogenous Time Series and Combined Neural Networks

Stock Price Forecasting Using Exogenous Time Series and Combined Neural Networks Stock Price Forecasting Using Exogenous Time Series and Combined eural etworks Manoel C. Amorim eto, Victor M. O. Alves, Gustavo Tavares, Lenildo Aragão Junior, George D. C. Cavalcanti and Tsang Ing Ren

More information

CHAPTER 5 RESULTS AND ANALYSIS

CHAPTER 5 RESULTS AND ANALYSIS CHAPTER 5 RESULTS AND ANALYSIS This chapter exhibits an extensive data analysis and the results of the statistical testing. Data analysis is done using factor analysis, regression analysis, reliability

More information

Calculation of Demand Curve Parameters

Calculation of Demand Curve Parameters Calculation of Demand Curve Parameters Rationale 4.1 Resource adequacy standard 4.1.1 The resource adequacy standard announced by the Government of Alberta prescribes a minimum level of reliability as

More information

RESULT AND DISCUSSION

RESULT AND DISCUSSION 4 Figure 3 shows ROC curve. It plots the probability of false positive (1-specificity) against true positive (sensitivity). The area under the ROC curve (AUR), which ranges from to 1, provides measure

More information

Binary Classification Modeling Final Deliverable. Using Logistic Regression to Build Credit Scores. Dagny Taggart

Binary Classification Modeling Final Deliverable. Using Logistic Regression to Build Credit Scores. Dagny Taggart Binary Classification Modeling Final Deliverable Using Logistic Regression to Build Credit Scores Dagny Taggart Supervised by Jennifer Lewis Priestley, Ph.D. Kennesaw State University Submitted 4/24/2015

More information

Chapter 5 RESULTS AND DISCUSSION

Chapter 5 RESULTS AND DISCUSSION Chapter 5 RESULTS AND DISCUSSION 5.0 Introduction This chapter outlines the results of the data analysis and discussion from the questionnaire survey. The detailed results are described in the following

More information

End-to-end electrical solutions for the biggest challenges involved with connecting wind energy to the grid. Solutions for Wind Energy Integration

End-to-end electrical solutions for the biggest challenges involved with connecting wind energy to the grid. Solutions for Wind Energy Integration End-to-end electrical solutions for the biggest challenges involved with connecting wind energy to the grid Solutions for Wind Energy Integration There are lots of challenges involved in harnessing the

More information

: Building Logistics, Information Flow, Information Technology.

: Building Logistics, Information Flow, Information Technology. Logistics Information flow Optimization SOFIA VILLAGARCIA, FRED BORGES DA SILVA and FRANCISCO CARDOSO Department of Civil Construction Engineering, Escola Politécnica - Universidade de São Paulo, Av. Almeida

More information

Data Mining. Implementation & Applications. Jean-Paul Isson. Sr. Director Global BI & Predictive Analytics Monster Worldwide

Data Mining. Implementation & Applications. Jean-Paul Isson. Sr. Director Global BI & Predictive Analytics Monster Worldwide Data Mining Implementation & Applications Jean-Paul Isson Sr. Director Global BI & Predictive Analytics Monster Worldwide Mar-2009 Agenda Data Mining & BI Vision Implementation : Success Criteria Knowledge

More information

A Web-based Framework of Project Performance and Control System

A Web-based Framework of Project Performance and Control System A Web-based Framework of Project Performance and Control System Jui-Sheng Chou* National Taiwan University of Science and Technology, Department of Construction Engineering Taipei, Taiwan jschou@mail.ntust.edu.tw

More information

Preface to the third edition Preface to the first edition Acknowledgments

Preface to the third edition Preface to the first edition Acknowledgments Contents Foreword Preface to the third edition Preface to the first edition Acknowledgments Part I PRELIMINARIES XXI XXIII XXVII XXIX CHAPTER 1 Introduction 3 1.1 What Is Business Analytics?................

More information

Paper Performing Machine Learning Techniques in a Contextual Marketing Scenario. Francisco Capetillo, Telefonica Chile

Paper Performing Machine Learning Techniques in a Contextual Marketing Scenario. Francisco Capetillo, Telefonica Chile Paper 2904-2018 Performing Machine Learning Techniques in a Contextual Marketing Scenario Francisco Capetillo, Telefonica Chile Abstract Although information for identifying high-potential customers is

More information

Load Modifying Resources. Capacity Instruments affecting Resource Availability and Need

Load Modifying Resources. Capacity Instruments affecting Resource Availability and Need Capacity Instruments affecting Resource Availability and Need May 25, 2018 Purpose Statement Review the participation and historic performance of Load Modifying Resources (LMRs) in MISO s Capacity and

More information

2014 Grid of the Future Symposium

2014 Grid of the Future Symposium 21, rue d Artois, F-75008 PARIS CIGRE US National Committee http : //www.cigre.org 2014 Grid of the Future Symposium Concepts and Practice Using Stochastic Programs for Determining Reserve Requirements

More information

On Distribution Asset Management: Development of Replacement Strategies

On Distribution Asset Management: Development of Replacement Strategies IEEE PES PowerAfrica 27 Conference and Exposition Johannesburg, South Africa, 16-2 July 27 On Distribution Asset Management: Development of Replacement Strategies 1 Miroslav Begovic, Fellow, IEEE, 1 Joshua

More information

A Systematic Approach to Performance Evaluation

A Systematic Approach to Performance Evaluation A Systematic Approach to Performance evaluation is the process of determining how well an existing or future computer system meets a set of alternative performance objectives. Arbitrarily selecting performance

More information

What is DSC 410/510? DSC 410/510 Multivariate Statistical Methods. What is Multivariate Analysis? Computing. Some Quotes.

What is DSC 410/510? DSC 410/510 Multivariate Statistical Methods. What is Multivariate Analysis? Computing. Some Quotes. What is DSC 410/510? DSC 410/510 Multivariate Statistical Methods Introduction Applications-oriented oriented introduction to multivariate statistical methods for MBAs and upper-level business undergraduates

More information

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration KnowledgeSTUDIO Advanced Modeling for Better Decisions Companies that compete with analytics are looking for advanced analytical technologies that accelerate decision making and identify opportunities

More information

Insights from the Wikipedia Contest

Insights from the Wikipedia Contest Insights from the Wikipedia Contest Kalpit V Desai, Roopesh Ranjan Abstract The Wikimedia Foundation has recently observed that newly joining editors on Wikipedia are increasingly failing to integrate

More information

COMPARISON OF LOGISTIC REGRESSION MODEL AND MARS CLASSIFICATION RESULTS ON BINARY RESPONSE FOR TEKNISI AHLI BBPLK SERANG TRAINING GRADUATES STATUS

COMPARISON OF LOGISTIC REGRESSION MODEL AND MARS CLASSIFICATION RESULTS ON BINARY RESPONSE FOR TEKNISI AHLI BBPLK SERANG TRAINING GRADUATES STATUS International Journal of Humanities, Religion and Social Science ISSN : 2548-5725 Volume 2, Issue 1 2017 www.doarj.org COMPARISON OF LOGISTIC REGRESSION MODEL AND MARS CLASSIFICATION RESULTS ON BINARY

More information

* Project over a ten year time horizon,

* Project over a ten year time horizon, FORECASTING RESIDENTIAL, COMMERCIAL, AND INDUSTRIAL GAS DEMAND Jeffrey P. Brand, Wisconsin Gas Company James D. Funk, Wisconsin Gas Company ABSTRACT This paper shows how the Wisconsin Gas Company used

More information

Different Instrumental Methods Which Can Be Used in New EIS: Theory and Practical Approach

Different Instrumental Methods Which Can Be Used in New EIS: Theory and Practical Approach Different Instrumental Methods Which Can Be Used in New EIS: Theory and Practical Approach Roman Veynberg and Victor Romanov Plekhanov Russian Economic University, Stremjannyj per., 36, 117997 Moscow,

More information

The Heterogeneity Principle in Evaluation Measures for Automatic Summarization

The Heterogeneity Principle in Evaluation Measures for Automatic Summarization The Heterogeneity Principle in Evaluation Measures for Automatic Summarization Enrique Amigó Julio Gonzalo Felisa Verdejo UNED, Madrid {enrique,julio,felisa}@lsi.uned.es Abstract The development of summarization

More information

Enhanced Cost Sensitive Boosting Network for Software Defect Prediction

Enhanced Cost Sensitive Boosting Network for Software Defect Prediction Enhanced Cost Sensitive Boosting Network for Software Defect Prediction Sreelekshmy. P M.Tech, Department of Computer Science and Engineering, Lourdes Matha College of Science & Technology, Kerala,India

More information

Predicting and Explaining Price-Spikes in Real-Time Electricity Markets

Predicting and Explaining Price-Spikes in Real-Time Electricity Markets Predicting and Explaining Price-Spikes in Real-Time Electricity Markets Christian Brown #1, Gregory Von Wald #2 # Energy Resources Engineering Department, Stanford University 367 Panama St, Stanford, CA

More information

Technical Note OPTIMIZATION OF THE PARAMETERS OF FEEDWATER CONTROL SYSTEM FOR OPR1000 NUCLEAR POWER PLANTS

Technical Note OPTIMIZATION OF THE PARAMETERS OF FEEDWATER CONTROL SYSTEM FOR OPR1000 NUCLEAR POWER PLANTS Technical Note OPTIMIZATION OF THE PARAMETERS OF FEEDWATER CONTROL SYSTEM FOR OPR1000 NUCLEAR POWER PLANTS UNG SOO KIM *, IN HO SONG, JONG JOO SOHN and EUN KEE KIM Safety Analysis Department, KEPCO Engineering

More information

Statistics and Data Analysis

Statistics and Data Analysis Selecting the Appropriate Outlier Treatment for Common Industry Applications Kunal Tiwari Krishna Mehta Nitin Jain Ramandeep Tiwari Gaurav Kanda Inductis Inc. 571 Central Avenue #105 New Providence, NJ

More information