by Xindong Wu, Kui Yu, Hao Wang, Wei Ding
|
|
- Merilyn Golden
- 6 years ago
- Views:
Transcription
1 Online Streaming Feature Selection by Xindong Wu, Kui Yu, Hao Wang, Wei Ding 1
2 Outline 1. Background and Motivation 2. Related Work 3. Notations and Definitions 4. Our Framework for Streaming Feature Selection 5. Online Streaming Feature Selection Algorithms 6. Experimental Results 2
3 1. Background and Motivation Traditional feature selection assumes that all features are available and presented to a learner before feature selection takes place Streaming feature selection: features are generated dynamically and arrive one at a time while the number of observations is left constant Example 1: Texture-based image segmentation assigns a label to each pixel in a training image according to its texture type, and an image might easily contain tens of thousands of labeled pixels, hence the computational cost is expensive in generating those features. Thus, we could spend a long time on collecting those features (Perkins & Theiler 2003) Example 2: the feature set size is unknown, or even infinite. 3
4 Challenge with Streaming Features Do we develop a new way to integrate the new feature as it arrives and begin the computation, or Spend a long time waiting for all generated features and then adopt existing algorithms? 4
5 Contributions 1. One step further on feature relevance and with explicit feature redundancy between a feature and a target class; 2. A novel framework based on feature relevance to manage streaming feature selection; 3. Two new online streaming feature selection algorithms are with comparative studies. 5
6 2. Related Work Perkins and Theiler (2003): a grafting algorithm based on a stagewise gradient descent approach for streaming feature selection grafting needs to determine the value of the tuning parameter λ in advance. Zhou et al. (2005; 2006): two algorithms based on streamwise regression, Information-investing and Alpha-investing for streaming feature selection both need prior knowledge about the structure of the feature space to heuristically control the choice of candidate feature selection. 6
7 3. Notations and Definitions Let V be a full set of features, Xi denote the ith input feature, and X \i represent all input features excluding X i. Definition 1 (Conditional independence) Two features X and Y are conditionally independent d given the set of features Z, if and only if P(X Y,Z)=P(X Z), denoted as Ind(X,Y Z). Accordingly, conditional dependence as Dep(X,Y Z). Definition 2 (Strong relevance) X i is strongly relevant to a target T if P(T X \ i ) P(T X \i,x Definition 3 (Weak relevance) X i is weakly relevant to a target T if X i is not strongly relevant and S X : P(T S) P(T S,X ) \ i i Definition 4 (Irrelevance) X i is irrelevant to a target T if it is neither strongly nor weakly relevant, if S X \i : P(T S) = ) i P(T S,X ) i 7
8 Notations and Definitions (2) Definition 5 (Markov blanket) Given a feature X i, assuming M i V, M i is a Markov blanket for X i, if and only if P( V M { X }, T X, M ) = PV ( M { X }, T M ) ( i i i i i i i Definition 6 (Redundant feature-1) A feature is redundant and should be removed from V (the current set of features), if and only if it is weakly relevant and has a Markov blanket M i within V. Rewrite Definition 6, Definition 7 (Redundant feature-2) Given a candidate Markov blanket of a target feature T, denoted as CMB(T), and afeaturex CMB(T), X is redundant to T, if and only if S CMB(T) : P(T X,S) = P(T S) 8
9 4. A Framework for Streaming Feature Selection 1. Initialization Best candidate feature set BCF={}, the target feature T 2. Online relevance analysis (1) Generate a new feature X (2) Determine whether X is irrelevant to T or not. a. If X is irrelevant to T, then disregarded; b. Otherwise, X is added to BCF 3. Online Redundancy analysis Online identify redundant features from the current subset BCF and remove them by Definition 7 4. Alternate Steps 2 and 3 until the stopping criteria are satisfied 5. Output BCF. 9
10 5. Online Streaming Feature Selection Algorithms OSFS: Online Streaming Feature Selection Algorithm Fast-OSFS: A fast version of OSFS 10
11 OSFS: Online Streaming Feature Selection OSFS finds an optimal subset using a two-phase scheme: online relevance analysis (steps 4-12) and online redundancy analysis (steps 13-21) (See the pseudo-code of OSFS on next page) Relevance analysis: discovers strongly and weakly relevant features and adds them into BCF accordingly When a new feature arrives, OSFS assesses whether it is irrelevant to the class label C; if so, it is discarded, otherwise it is added to BCF Redundancy analysis: if a new feature enters BCF, this phase dynamically eliminates redundant features within BCF If there exists a subset within BCF to make Y and C conditionally independent, Y is removed from BCF OSFS alternates the two phases till some stopping criteria are satisfied. 11
12 The pseudo-code of OSFS 12
13 Time complexity of OSFS Depends on the number of independent tests. At time t, assuming V features are arriving, then the worstcase complexity is O( V BCF k BCF ) where k is the maximum allowable size that a conditioning set may grow. Assuming SF V, SF << V where SF contains all strongly relevant features, then the average time complexity is O( SF BCF k BCF ) at time t. 13
14 Time complexity of OSFS The most time-consuming part is the redundancy d analysis phase. When a new feature enters BCF, redundancy analysis will re-examine examine each feature within BCF with respect to its relevance to C. In order to further improve the selection efficiency, Fast-OSFS is designed on next page. 14
15 The Fast-OSFS Algorithm 15
16 The Fast-OSFS Algorithm The key difference is that t Fast-OSFS divides id the redundancy analysis phase into two phases inner-redundancy analysis and outer-redundancy analysis Fast-OSFS only alternates the relevance analysis and the inner-redundancy analysis phase In inner-redundancy analysis, Fast-OSFS only reexamines the feature just added into BCF In outer-redundancy redundancy analysis, it re-examines examines each feature of BCF only when the process of generating a feature is stopped. 16
17 Time complexity of Fast-OSFS The worst-case complexity is O( V k BCF + BCF k BCF ) The average is O( SF k BCF + BCF k BCF ) at time t. 17
18 6. Experimental Results Data sets: 8 UCI benchmark databases and 10 challenge databases Three classifiers: k-nn, J48 and Randomforest (Spider 2010), and selected the best accuracy as the result Grafting and Alpha-investing were performed using their original implementations. The tuning parameter λ for Grafting: selected using cross-validation The parameters of Alpha-investing: default settings, W 0 =0.5 and a Δ =0.5. The conditional independence tests in our implementation are G 2 tests and the parameter alpha is the statistical significance level. 18
19 Results on UCI Benchmark Data Sets 19
20 The win/tie/loss counts of our methods vs. other methods OSFS Fast-OSFS Grafting 5/1/2 4/0/4 Alphainvesting 7/0/1 5/1/2 Note: Alpha-investing selects all features on the Wdbc data. The compactness and predictive accuracy of 4 algorithms (alpha=0.01) 20
21 The win/tie/loss counts of our methods vs. other methods OSFS Fast-OSFS Grafting 3/2/3 4/1/3 Alphainvesting 7/0/1 8/0/0 Note: Alpha-investing selects all features on the Wdbc data. The compactness and predictive accuracy of 4 algorithms (alpha=0.05) 21
22 OSFS performance with different alpha values Fast-OSFS performance with different alpha values 22
23 An performance analysis with different alpha values When alpha is up to 0.05, our two algorithms tend to select more features, but the accuracy of them is different. OSFS degrades a little while Fast-OSFS improves a little. When alpha is equal to 0.01 and up to 0.05, two algorithms have similar performance in our experiments. 23
24 Results on Challenge Data Sets 24
25 Alphainvesting failed to select any features. Grafting The fails win/tie/loss to select any counts features of our on methods the Dorohthea vs. other and Breastcancer data because of the problem methods of out of memory OSFS Fast-OSFS Grafting 8/0/2 7/0/3 The compactness and prediction accuracy (%) of four algorithms (alpha=0.01) Alpha- 8/0/2 6/0/4 investing 25
26 Running Time Analysis The time reported is the normalized time: the running time of OSFS for a data set divided by the corresponding running time of Fast-OSFS. A greater normalized running time than one implies that OSFS is slower than Fast-OSFS on the same learning task. 26
27 Running Time Analysis On the UCI data sets, Fast-OSFS is at least twice faster than OSFS. Since the running time of Fast-OSFS and OSFS is less than one second on most of these data sets, we only report the running time longer than ten seconds on five data sets in Figure 8 (left: alpha=0.01; right: alpha=0.05). 27
28 Discussions: Grafting & Alpha-Investing Grafting: with a low dimensional data set, it is competitive with our methods; with a high dimensional data set, it is inferior to our methods. Its main drawback: it needs to choose the tuning parameter λ in advance. Alpha-investing: our algorithms outperform Alphainvesting on most of the 18 datasets. With prior knowledge of the structure of the candidate features, Alpha-investing could achieve good performance. If with prior knowledge, our framework can also deal with the task well. 28
29 Discussions: OSFS vs. Fast-OSFS Compactness: Fast-OSFS is competitive with OSFS Predictive accuracy: our empirical finding OSFS outperforms Fast-OSFS on datasets with a very small sample-to-variable ratio Fast-OSFS is superior to OSFS on datasets with a large sample. 29
30 Discussions: False Negatives To control false positives, two strategies: multiple comparisons and the parameter k. The parameter k is the maximum allowable size that a conditioning set may grow, and dis a key parameter. In online redundancy analysis, multiple statistical comparisons filter redundant features, and find all subsets from BCF to perform multiple tests, and the size of the maximum subset is k. Under the assumption that all independence tests are reliable, with a right value of k, the false positives will be well controlled. Thus,the experimental results show that our algorithms exhibit little sensitivity ii i to false positive ii features 30
31 Conclusion We have proposed a novel framework with two new algorithms to deal with streaming feature selection. Compared with two state-of-the-art algorithms Grafting and Alpha-investing, our algorithms have demonstrated more compactness and better accuracy in supervised learning on databases that contain many irrelevant and redundant features. 31
32 Future work In our experiments, we stimulated the feature set with an unknown but finite size. Explore how to dynamically assess the predictive accuracy with an infinite size, when reaching a certain threshold. Study the impact of stopping criteria i on the OSFS and Fast-OSFS algorithms. Apply online streaming feature selection to real Mars crater data, where craters are represented by thousands of texture-based features that call for efficient feature selection. 32
A Comparative Study of Filter-based Feature Ranking Techniques
Western Kentucky University From the SelectedWorks of Dr. Huanjing Wang August, 2010 A Comparative Study of Filter-based Feature Ranking Techniques Huanjing Wang, Western Kentucky University Taghi M. Khoshgoftaar,
More informationPredicting prokaryotic incubation times from genomic features Maeva Fincker - Final report
Predicting prokaryotic incubation times from genomic features Maeva Fincker - mfincker@stanford.edu Final report Introduction We have barely scratched the surface when it comes to microbial diversity.
More informationDynamic Cloud Resource Reservation via Cloud Brokerage
Dynamic Cloud Resource Reservation via Cloud Brokerage Wei Wang*, Di Niu +, Baochun Li*, Ben Liang* * Department of Electrical and Computer Engineering, University of Toronto + Department of Electrical
More informationUser Profiling in an Ego Network: Co-profiling Attributes and Relationships
User Profiling in an Ego Network: Co-profiling Attributes and Relationships Rui Li, Chi Wang, Kevin Chen-Chuan Chang, {ruili1, chiwang1, kcchang}@illinois.edu Department of Computer Science, University
More informationGenomic Selection with Linear Models and Rank Aggregation
Genomic Selection with Linear Models and Rank Aggregation m.scutari@ucl.ac.uk Genetics Institute March 5th, 2012 Genomic Selection Genomic Selection Genomic Selection: an Overview Genomic selection (GS)
More informationStochastic Fractal Search Algorithm for 3D Protein Structure Prediction Chuan SUN 1, Zi-qi WEI 2, Chang-jun ZHOU 1,* and Bin WANG 1
206 International Conference on Artificial Intelligence and Computer Science (AICS 206 ISBN: 978--60595-4-0 Stochastic Fractal Search Algorithm for 3D Protein Structure Prediction Chuan SUN, Zi-qi WEI
More informationGenetic Algorithm with Upgrading Operator
Genetic Algorithm with Upgrading Operator NIDAPAN SUREERATTANAN Computer Science and Information Management, School of Advanced Technologies, Asian Institute of Technology, P.O. Box 4, Klong Luang, Pathumthani
More informationProgress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong
Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Machine learning models can be used to predict which recommended content users will click on a given website.
More informationAnti-Money Laundering Solution Deep Dive WHITE PAPER
Anti-Money Laundering Solution Deep Dive An AI-Driven Approach to AML Anti-Money Laundering (AML) is a particularly challenging area of regulation for banks even more so for large, geographically diverse
More informationProactive Data Mining Using Decision Trees
2012 IEEE 27-th Convention of Electrical and Electronics Engineers in Israel Proactive Data Mining Using Decision Trees Haim Dahan and Oded Maimon Dept. of Industrial Engineering Tel-Aviv University Tel
More informationDynamic Vehicle Routing and Dispatching
Dynamic Vehicle Routing and Dispatching Jean-Yves Potvin Département d informatique et recherche opérationnelle and Centre interuniversitaire de recherche sur les réseaux d entreprise, la logistique et
More informationA Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG 1, Hong-Yu FU 1
International Conference on Management Science and Management Innovation (MSMI 2014) A Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG
More informationFault Detection of Large Amounts of Photovoltaic Systems
Fault Detection of Large Amounts of Photovoltaic Systems Patrick Traxler Software Competence Center Hagenberg, Austria patrick.traxler@scch.at Abstract. We study a model-based approach to detect sustainable
More informationDisentangling Prognostic and Predictive Biomarkers Through Mutual Information
Informatics for Health: Connected Citizen-Led Wellness and Population Health R. Randell et al. (Eds.) 2017 European Federation for Medical Informatics (EFMI) and IOS Press. This article is published online
More informationLocalization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification
Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification Senqiang Zhou Ke Wang School of Computing Science Simon Fraser University {szhoua@cs.sfu.ca, wangk@cs.sfu.ca}
More informationPROFITABLE ITEMSET MINING USING WEIGHTS
PROFITABLE ITEMSET MINING USING WEIGHTS T.Lakshmi Surekha 1, Ch.Srilekha 2, G.Madhuri 3, Ch.Sujitha 4, G.Kusumanjali 5 1Assistant Professor, Department of IT, VR Siddhartha Engineering College, Andhra
More informationCSE 255 Lecture 3. Data Mining and Predictive Analytics. Supervised learning Classification
CSE 255 Lecture 3 Data Mining and Predictive Analytics Supervised learning Classification Last week Last week we started looking at supervised learning problems Last week We studied linear regression,
More informationWhetstone An Accessible, Platform-Independent Method for Training Spiking Deep Neural Networks for Neuromorphic Processors
Whetstone An Accessible, Platform-Independent Method for Training Spiking Deep Neural Networks for Neuromorphic Processors W i l l i a m M. S e v e r a *, C r a i g M. V i n e y a r d, R y a n D e l l
More informationDealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources
Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources Ying Yang, Xindong Wu, and Xingquan Zhu Department of Computer Science, University of Vermont, Burlington VT 05405, USA {yyang,
More informationDealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources
Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources Ying Yang, Xindong Wu, and Xingquan Zhu Department of Computer Science, University of Vermont, Burlington VT 05405, USA {yyang,xwu,xqzhu}@cs.uvm.edu
More informationCHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT
CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT 8.1 Introduction Customer Relationship Management (CRM) is a process that manages the interactions between a company and its customers.
More informationInferring Social Ties across Heterogeneous Networks
Inferring Social Ties across Heterogeneous Networks CS 6001 Complex Network Structures HARISH ANANDAN Introduction Social Ties Information carrying connections between people It can be: Strong, weak or
More informationIntroduction to Reinforcement Learning. CS : Deep Reinforcement Learning Sergey Levine
Introduction to Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 1 is due next Wednesday! Remember that Monday is a holiday, so no office hours 2. Remember
More informationIntro Logistic Regression Gradient Descent + SGD
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 29, 2016 1 Ad Placement
More informationPredicting Credit Card Customer Loyalty Using Artificial Neural Networks
Predicting Credit Card Customer Loyalty Using Artificial Neural Networks Tao Zhang Bo Yuan Wenhuang Liu Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, P.R. China E-Mail: kirasz06@gmail.com,
More informationMining the Situation: Spatiotemporal Traffic Prediction With Big Data
702 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 9, NO. 4, JUNE 2015 Mining the Situation: Spatiotemporal Traffic Prediction With Big Data Jie Xu, Dingxiong Deng, Ugur Demiryurek, Cyrus Shahabi,
More informationAutomatic Facial Expression Recognition
Automatic Facial Expression Recognition Huchuan Lu, Pei Wu, Hui Lin, Deli Yang School of Electronic and Information Engineering, Dalian University of Technology Dalian, Liaoning Province, China lhchuan@dlut.edu.cn
More informationCombinational Collaborative Filtering: An Approach For Personalised, Contextually Relevant Product Recommendation Baskets
Combinational Collaborative Filtering: An Approach For Personalised, Contextually Relevant Product Recommendation Baskets Research Project - Jai Chopra (338852) Dr Wei Wang (Supervisor) Dr Yifang Sun (Assessor)
More informationUser Behavior Recovery via Hidden Markov Models Analysis
User Behavior Recovery via Hidden Markov Models Analysis Alina Maor, Doron Shaked Hewlett Packard Labs HPE-2016-62 Keyword(s): Hidden Markov Model; predictive analytics; classification; statistical methods;
More informationPredict Commercial Promoted Contents Will Be Clicked By User
Predict Commercial Promoted Contents Will Be Clicked By User Gary(Xinran) Guo garyguo@stanford.edu SUNetID: garyguo Stanford University 1. Introduction As e-commerce, social media grows rapidly, advertisements
More informationCS6716 Pattern Recognition
CS6716 Pattern Recognition Aaron Bobick School of Interactive Computing Administrivia Shray says the problem set is close to done Today chapter 15 of the Hastie book. Very few slides brought to you by
More informationNear-Balanced Incomplete Block Designs with An Application to Poster Competitions
Near-Balanced Incomplete Block Designs with An Application to Poster Competitions arxiv:1806.00034v1 [stat.ap] 31 May 2018 Xiaoyue Niu and James L. Rosenberger Department of Statistics, The Pennsylvania
More informationA HYBRID MODERN AND CLASSICAL ALGORITHM FOR INDONESIAN ELECTRICITY DEMAND FORECASTING
A HYBRID MODERN AND CLASSICAL ALGORITHM FOR INDONESIAN ELECTRICITY DEMAND FORECASTING Wahab Musa Department of Electrical Engineering, Universitas Negeri Gorontalo, Kota Gorontalo, Indonesia E-Mail: wmusa@ung.ac.id
More informationPreface to the third edition Preface to the first edition Acknowledgments
Contents Foreword Preface to the third edition Preface to the first edition Acknowledgments Part I PRELIMINARIES XXI XXIII XXVII XXIX CHAPTER 1 Introduction 3 1.1 What Is Business Analytics?................
More informationFilter-Wrapper based Feature Ranking Technique for Dynamic Software Quality Attributes
Filter-Wrapper based Feature Ranking Technique for Dynamic Software Quality Attributes Siti Sakira Kamaruddin 1, Jamaiah Yahaya 2 Aziz Deraman 3, and Ruzita Ahmad 4 1 Universiti Utara Malaysia, Malaysia,
More informationCOMBINED-OBJECTIVE OPTIMIZATION IN IDENTICAL PARALLEL MACHINE SCHEDULING PROBLEM USING PSO
COMBINED-OBJECTIVE OPTIMIZATION IN IDENTICAL PARALLEL MACHINE SCHEDULING PROBLEM USING PSO Bathrinath S. 1, Saravanasankar S. 1 and Ponnambalam SG. 2 1 Department of Mechanical Engineering, Kalasalingam
More informationAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification Algorithms Jieming Wei Sharon Zhang Introduction Many organizations prospect for loyal supporters and donors by sending direct mail appeals. This is an
More informationSteering Information Diffusion Dynamically against User Attention Limitation
Steering Information Diffusion Dynamically against User Attention Limitation Shuyang Lin, Qingbo Hu, Fengjiao Wang, Philip S.Yu Department of Computer Science University of Illinois at Chicago Chicago,
More informationarxiv: v3 [cs.cv] 29 Aug 2017
Accurate Pulmonary Nodule Detection in Computed Tomography Images Using Deep Convolutional Neural Networks arxiv:1706.04303v3 [cs.cv] 29 Aug 2017 Jia Ding, Aoxue Li, Zhiqiang Hu and Liwei Wang School of
More informationWE consider the general ranking problem, where a computer
5140 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 11, NOVEMBER 2008 Statistical Analysis of Bayes Optimal Subset Ranking David Cossock and Tong Zhang Abstract The ranking problem has become increasingly
More informationWater Treatment Plant Decision Making Using Rough Multiple Classifier Systems
Available online at www.sciencedirect.com Procedia Environmental Sciences 11 (2011) 1419 1423 Water Treatment Plant Decision Making Using Rough Multiple Classifier Systems Lin Feng 1, 2 1 College of Computer
More informationSINCE the introduction of cellular automata (CA) in [1],
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1 On Routine Evolution of Complex Cellular Automata Michal Bidlo Abstract The paper deals with a special technique, called conditionally matching rules, for
More informationGA-SVM WRAPPER APPROACH FOR GENE RANKING AND CLASSIFICATION USING EXPRESSIONS OF VERY FEW GENES
GA-SVM WRAPPER APPROACH FOR GENE RANKING AND CLASSIFICATION USING EXPRESSIONS OF VERY FEW GENES N.REVATHY 1, Dr.R.BALASUBRAMANIAN 2 1 Assistant Professor, Department of Computer Applications, Karpagam
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/5/18 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 High dim. data Graph data Infinite data Machine
More informationPrediction of Personalized Rating by Combining Bandwagon Effect and Social Group Opinion: using Hadoop-Spark Framework
Prediction of Personalized Rating by Combining Bandwagon Effect and Social Group Opinion: using Hadoop-Spark Framework Lu Sun 1, Kiejin Park 2 and Limei Peng 1 1 Department of Industrial Engineering, Ajou
More informationTime Optimal Profit Maximization in a Social Network
Time Optimal Profit Maximization in a Social Network Yong Liu, Wei Zhang Hei Long Jiang University China liuyong001@hlju.edu.cn 149275442@qq.com ABSTRACT: Influence maximization aims to seek k nodes from
More informationCorrecting Sampling Bias in Structural Genomics through Iterative Selection of Underrepresented Targets
Correcting Sampling Bias in Structural Genomics through Iterative Selection of Underrepresented Targets Kang Peng Slobodan Vucetic Zoran Obradovic Abstract In this study we proposed an iterative procedure
More informationBoundedly Rational Consumers
Boundedly Rational Consumers Marco VALENTE 1 1 LEM, S. Anna School of Advanced Studies, Pisa University of L Aquila Background Mainstream: consumers behaviour is represented by maximisation of a utility
More informationA Semi-automated Peer-review System Bradly Alicea Orthogonal Research
A Semi-automated Peer-review System Bradly Alicea bradly.alicea@ieee.org Orthogonal Research Abstract A semi-supervised model of peer review is introduced that is intended to overcome the bias and incompleteness
More informationAn Implementation of genetic algorithm based feature selection approach over medical datasets
An Implementation of genetic algorithm based feature selection approach over medical s Dr. A. Shaik Abdul Khadir #1, K. Mohamed Amanullah #2 #1 Research Department of Computer Science, KhadirMohideen College,
More information2 Analysts general forecast effort as determinant of earnings forecast
2 Analysts general forecast effort as determinant of earnings forecast accuracy In this chapter, I introduce a new variable to measure the forecast effort an analyst devotes when making earnings forecasts.
More informationCFSSP: Chou and Fasman Secondary Structure Prediction server
Wide Spectrum, Vol. 1, No. 9, (2013) pp 15-19 CFSSP: Chou and Fasman Secondary Structure Prediction server T. Ashok Kumar Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil
More informationFeature Selection in Pharmacogenetics
Feature Selection in Pharmacogenetics Application to Calcium Channel Blockers in Hypertension Treatment IEEE CIS June 2006 Dr. Troy Bremer Prediction Sciences Pharmacogenetics Great potential SNPs (Single
More informationIntroduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras
Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras Week 05 Lecture 19 Priority Based Scheduling Algorithms So
More informationCorrelation and Instance Based Feature Selection for Electricity Load Forecasting
Correlation and Instance Based Feature Selection for Electricity Load Forecasting Irena Koprinska a, Mashud Rana a, Vassilios G. Agelidis b a School of Information Technologies, University of Sydney, Sydney,
More informationA Memory Enhanced Evolutionary Algorithm for Dynamic Scheduling Problems
A Memory Enhanced Evolutionary Algorithm for Dynamic Scheduling Problems Gregory J. Barlow and Stephen F. Smith Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA gjb@cmu.edu, sfs@cs.cmu.edu
More informationCONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS
CONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS Darie MOLDOVAN, PhD * Mircea RUSU, PhD student ** Abstract The objective of this paper is to demonstrate the utility
More informationAuctioning Experts in Credit Modeling
Auctioning Experts in Credit Modeling Robert Stine The School, Univ of Pennsylvania May, 2004 www-stat.wharton.upenn.edu/~stine Opportunities Anticipate default - Who are most likely to default in the
More informationConclusions and Future Work
Chapter 9 Conclusions and Future Work Having done the exhaustive study of recommender systems belonging to various domains, stock market prediction systems, social resource recommender, tag recommender
More informationBioinformatics : Gene Expression Data Analysis
05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used
More informationRepresentation in Supervised Machine Learning Application to Biological Problems
Representation in Supervised Machine Learning Application to Biological Problems Frank Lab Howard Hughes Medical Institute & Columbia University 2010 Robert Howard Langlois Hughes Medical Institute What
More informationA logistic regression model for Semantic Web service matchmaking
. BRIEF REPORT. SCIENCE CHINA Information Sciences July 2012 Vol. 55 No. 7: 1715 1720 doi: 10.1007/s11432-012-4591-x A logistic regression model for Semantic Web service matchmaking WEI DengPing 1*, WANG
More informationSimulated Annealing Algorithm for Vehicle Routing Problem with Transshipment
Simulated Annealing Algorithm for Vehicle Routing Problem with Transshipment Sukanya Thonethong #1, Jirachai Buddhakulsomsiri #2 # Logistics and Supply Chain Systems Engineering Sirindhorn International
More informationDATA SEGMENTATION POWERHOUSE MICHELLE LELEMPSIS, CAUSEIS. Michelle Lelempsis Lead Solutions Consultant
MICHELLE LELEMPSIS, CAUSEIS YOUR PRESENTER I LOVE DATA Michelle Lelempsis Lead Solutions Consultant Worked with imis for over 12 years Data Analyst by default! Help organisations drive decisions through
More informationPreprocessing Technique for Discrimination Prevention in Data Mining
The International Journal Of Engineering And Science (IJES) Volume 3 Issue 6 Pages 12-16 2014 ISSN (e): 2319 1813 ISSN (p): 2319 1805 Preprocessing Technique for Discrimination Prevention in Data Mining
More informationFinancial Time Series Segmentation Based On Turning Points
Proceedings of 2011 International Conference on System Science and Engineering, Macau, China - June 2011 Financial Time Series Segmentation Based On Turning Points Jiangling Yin, Yain-Whar Si, Zhiguo Gong
More informationAirbnb Price Estimation. Hoormazd Rezaei SUNet ID: hoormazd. Project Category: General Machine Learning gitlab.com/hoorir/cs229-project.
Airbnb Price Estimation Liubov Nikolenko SUNet ID: liubov Hoormazd Rezaei SUNet ID: hoormazd Pouya Rezazadeh SUNet ID: pouyar Project Category: General Machine Learning gitlab.com/hoorir/cs229-project.git
More informationAssociation rules model of e-banking services
Association rules model of e-banking services V. Aggelis Department of Computer Engineering and Informatics, University of Patras, Greece Abstract The introduction of data mining methods in the banking
More informationApproximate Safety Enforcement Using Computed Viability Envelopes
Approximate Safety Enforcement Using Computed Viability Envelopes Maciej Kalisiak University of Toronto Michiel van de Panne University of British Columbia IEEE International
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/8/2015 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 High dim. data Graph data Infinite data Machine
More informationMathematical Modeling and Analysis of Finite Queueing System with Unreliable Single Server
IOSR Journal of Mathematics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 12, Issue 3 Ver. VII (May. - Jun. 2016), PP 08-14 www.iosrjournals.org Mathematical Modeling and Analysis of Finite Queueing
More informationQoS-based Scheduling for Task Management in Grid Computing
QoS-based Scheduling for Task Management in Grid Computing Xiaohong Huang 1, Maode Ma 2, Yan Ma 1 Abstract--Due to the heterogeneity, complexity, and autonomy of wide spread Grid resources, the dynamic
More informationAssociation rules model of e-banking services
In 5 th International Conference on Data Mining, Text Mining and their Business Applications, 2004 Association rules model of e-banking services Vasilis Aggelis Department of Computer Engineering and Informatics,
More informationConsolidated Report TAUS DQF Aggregated Enterprise Solution
Consolidated Report TAUS DQF Aggregated Enterprise Solution Survey results and summary of feedback gained in April 2016 from 16 companies that participated in the consultation. Content 1. Introduction
More informationBig Data. Methodological issues in using Big Data for Official Statistics
Giulio Barcaroli Istat (barcarol@istat.it) Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Methodological issues in using Big Data for Official Statistics
More informationPredicting Corporate Influence Cascades In Health Care Communities
Predicting Corporate Influence Cascades In Health Care Communities Shouzhong Shi, Chaudary Zeeshan Arif, Sarah Tran December 11, 2015 Part A Introduction The standard model of drug prescription choice
More informationModeling of competition in revenue management Petr Fiala 1
Modeling of competition in revenue management Petr Fiala 1 Abstract. Revenue management (RM) is the art and science of predicting consumer behavior and optimizing price and product availability to maximize
More informationA STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET
A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,
More informationA COMPUTATIONALLY EFFICIENT AND SCALABLE SHELF LIFE ESTIMATION MODEL FOR WIRELESS TEMPERATURE SENSORS IN THE SUPPLY CHAIN
A COMPUTATIONALLY EFFICIENT AND SCALABLE SHELF LIFE ESTIMATION MODEL FOR WIRELESS TEMPERATURE SENSORS IN THE SUPPLY CHAIN Ismail Uysal (a), Jean-Pierre Emond (b), Gisele Bennett (c) (a,b) College of Technology
More informationCase studies in Data Mining & Knowledge Discovery
Case studies in Data Mining & Knowledge Discovery Knowledge Discovery is a process Data Mining is just a step of a (potentially) complex sequence of tasks KDD Process Data Mining & Knowledge Discovery
More informationOptimal Dynamic Pricing of Perishable Products Considering Quantity Discount Policy
Journal of Information & Computational Science 10:14 (013) 4495 4504 September 0, 013 Available at http://www.joics.com Optimal Dynamic Pricing of Perishable Products Considering Quantity Discount Policy
More informationPredicting ratings of peer-generated content with personalized metrics
Predicting ratings of peer-generated content with personalized metrics Project report Tyler Casey tyler.casey09@gmail.com Marius Lazer mlazer@stanford.edu [Group #40] Ashish Mathew amathew9@stanford.edu
More informationA Simple EOQ-like Solution to an Inventory System with Compound Poisson and Deterministic Demand
A Simple EOQ-like Solution to an Inventory System with Compound Poisson and Deterministic Demand Katy S. Azoury San Francisco State University, San Francisco, California, USA Julia Miyaoka* San Francisco
More informationA Novel Splice Site Prediction Method using Support Vector Machine
Journal of Computational Information Systems 9: 20 (2013) 8053 8060 Available at http://www.jofcis.com A Novel Splice Site Prediction Method using Support Vector Machine Dan WEI 1,2, Huiling ZHANG 2, Yanjie
More informationFraud Detection for MCC Manipulation
2016 International Conference on Informatics, Management Engineering and Industrial Application (IMEIA 2016) ISBN: 978-1-60595-345-8 Fraud Detection for MCC Manipulation Hong-feng CHAI 1, Xin LIU 2, Yan-jun
More informationSingle Machine Scheduling with Interfering Job Sets
Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 009) 0- August 009, Dublin, Ireland MISTA 009 Single Machine Scheduling with Interfering Job Sets Ketan Khowala,
More informationOptimal Demand Response Using Device Based Reinforcement Learning
Optimal Demand Response Using Device Based Reinforcement Learning Zheng Wen 1 Joint Work with Hamid Reza Maei 1 and Dan O Neill 1 1 Department of Electrical Engineering Stanford University zhengwen@stanford.edu
More informationThe Mystery Behind Project Management Metrics. Reed Shell Blue Hippo Consulting
The Mystery Behind Project Management Metrics Reed Shell Blue Hippo Consulting Presentation Take-Aways Two Tools for gathering and producing metrics 10 Step Process Goal/Question/Metric Deliverable Exercises
More informationComparison of Different Independent Component Analysis Algorithms for Sales Forecasting
International Journal of Humanities Management Sciences IJHMS Volume 2, Issue 1 2014 ISSN 2320 4044 Online Comparison of Different Independent Component Analysis Algorithms for Sales Forecasting Wensheng
More informationWhat about streaming data?
What about streaming data? 1 The Stream Model Data enters at a rapid rate from one or more input ports Such data are called stream tuples The system cannot store the entire (infinite) stream Distribution
More informationIMPOSSIBILITY OF CONSENSUS
IMPOSSIBILITY OF CONSENSUS Fall 2012 Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1} Asynchronous,
More informationFixed vs. Self-Adaptive Crossover-First Differential Evolution
Applied Mathematical Sciences, Vol. 10, 2016, no. 32, 1603-1610 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2016.6377 Fixed vs. Self-Adaptive Crossover-First Differential Evolution Jason
More informationA NOTE ON METHODS OF ESTIMATING REGIONAL INPUT-OUTPUT TABLES: CAN THE FLQ IMPROVE THE RAS ALGORITHM?
A NOTE ON METHODS OF ESTIMATING REGIONAL INPUT-OUTPUT TABLES: CAN THE FLQ IMPROVE THE RAS ALGORITHM? Steven Brand July 2012 Author contact details: Plymouth University Drake s Circus Plymouth PL4 8AA sbrand@plymouth.ac.uk
More informationALLOCATING SHARED RESOURCES OPTIMALLY FOR CALL CENTER OPERATIONS AND KNOWLEDGE MANAGEMENT ACTIVITIES
ALLOCATING SHARED RESOURCES OPTIMALLY FOR CALL CENTER OPERATIONS AND KNOWLEDGE MANAGEMENT ACTIVITIES Research-in-Progress Abhijeet Ghoshal Alok Gupta University of Minnesota University of Minnesota 321,
More informationInformation Effects on Performance of Two-tier Service Systems with Strategic Customers. Zhe George Zhang
Information Effects on Performance of Two-tier Service Systems with Strategic Customers Zhe George Zhang Department of Decision Sciences, Western Washington University, Bellingham, WA98225, USA & Beedie
More informationCHAPTER 4 PROPOSED HYBRID INTELLIGENT APPROCH FOR MULTIPROCESSOR SCHEDULING
79 CHAPTER 4 PROPOSED HYBRID INTELLIGENT APPROCH FOR MULTIPROCESSOR SCHEDULING The present chapter proposes a hybrid intelligent approach (IPSO-AIS) using Improved Particle Swarm Optimization (IPSO) with
More informationCross Validation and MLP Architecture Selection
Cross Validation and MLP Architecture Selection Tim Andersen and Tony Martinez tim@axon.cs.byu.edu, martinez@cs.byu.edu Computer Science Department, Brigham Young University Abstract The performance of
More informationChapter 5 Evaluating Classification & Predictive Performance
Chapter 5 Evaluating Classification & Predictive Performance Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Why Evaluate? Multiple methods are available
More information