by Xindong Wu, Kui Yu, Hao Wang, Wei Ding

Size: px
Start display at page:

Download "by Xindong Wu, Kui Yu, Hao Wang, Wei Ding"

Transcription

1 Online Streaming Feature Selection by Xindong Wu, Kui Yu, Hao Wang, Wei Ding 1

2 Outline 1. Background and Motivation 2. Related Work 3. Notations and Definitions 4. Our Framework for Streaming Feature Selection 5. Online Streaming Feature Selection Algorithms 6. Experimental Results 2

3 1. Background and Motivation Traditional feature selection assumes that all features are available and presented to a learner before feature selection takes place Streaming feature selection: features are generated dynamically and arrive one at a time while the number of observations is left constant Example 1: Texture-based image segmentation assigns a label to each pixel in a training image according to its texture type, and an image might easily contain tens of thousands of labeled pixels, hence the computational cost is expensive in generating those features. Thus, we could spend a long time on collecting those features (Perkins & Theiler 2003) Example 2: the feature set size is unknown, or even infinite. 3

4 Challenge with Streaming Features Do we develop a new way to integrate the new feature as it arrives and begin the computation, or Spend a long time waiting for all generated features and then adopt existing algorithms? 4

5 Contributions 1. One step further on feature relevance and with explicit feature redundancy between a feature and a target class; 2. A novel framework based on feature relevance to manage streaming feature selection; 3. Two new online streaming feature selection algorithms are with comparative studies. 5

6 2. Related Work Perkins and Theiler (2003): a grafting algorithm based on a stagewise gradient descent approach for streaming feature selection grafting needs to determine the value of the tuning parameter λ in advance. Zhou et al. (2005; 2006): two algorithms based on streamwise regression, Information-investing and Alpha-investing for streaming feature selection both need prior knowledge about the structure of the feature space to heuristically control the choice of candidate feature selection. 6

7 3. Notations and Definitions Let V be a full set of features, Xi denote the ith input feature, and X \i represent all input features excluding X i. Definition 1 (Conditional independence) Two features X and Y are conditionally independent d given the set of features Z, if and only if P(X Y,Z)=P(X Z), denoted as Ind(X,Y Z). Accordingly, conditional dependence as Dep(X,Y Z). Definition 2 (Strong relevance) X i is strongly relevant to a target T if P(T X \ i ) P(T X \i,x Definition 3 (Weak relevance) X i is weakly relevant to a target T if X i is not strongly relevant and S X : P(T S) P(T S,X ) \ i i Definition 4 (Irrelevance) X i is irrelevant to a target T if it is neither strongly nor weakly relevant, if S X \i : P(T S) = ) i P(T S,X ) i 7

8 Notations and Definitions (2) Definition 5 (Markov blanket) Given a feature X i, assuming M i V, M i is a Markov blanket for X i, if and only if P( V M { X }, T X, M ) = PV ( M { X }, T M ) ( i i i i i i i Definition 6 (Redundant feature-1) A feature is redundant and should be removed from V (the current set of features), if and only if it is weakly relevant and has a Markov blanket M i within V. Rewrite Definition 6, Definition 7 (Redundant feature-2) Given a candidate Markov blanket of a target feature T, denoted as CMB(T), and afeaturex CMB(T), X is redundant to T, if and only if S CMB(T) : P(T X,S) = P(T S) 8

9 4. A Framework for Streaming Feature Selection 1. Initialization Best candidate feature set BCF={}, the target feature T 2. Online relevance analysis (1) Generate a new feature X (2) Determine whether X is irrelevant to T or not. a. If X is irrelevant to T, then disregarded; b. Otherwise, X is added to BCF 3. Online Redundancy analysis Online identify redundant features from the current subset BCF and remove them by Definition 7 4. Alternate Steps 2 and 3 until the stopping criteria are satisfied 5. Output BCF. 9

10 5. Online Streaming Feature Selection Algorithms OSFS: Online Streaming Feature Selection Algorithm Fast-OSFS: A fast version of OSFS 10

11 OSFS: Online Streaming Feature Selection OSFS finds an optimal subset using a two-phase scheme: online relevance analysis (steps 4-12) and online redundancy analysis (steps 13-21) (See the pseudo-code of OSFS on next page) Relevance analysis: discovers strongly and weakly relevant features and adds them into BCF accordingly When a new feature arrives, OSFS assesses whether it is irrelevant to the class label C; if so, it is discarded, otherwise it is added to BCF Redundancy analysis: if a new feature enters BCF, this phase dynamically eliminates redundant features within BCF If there exists a subset within BCF to make Y and C conditionally independent, Y is removed from BCF OSFS alternates the two phases till some stopping criteria are satisfied. 11

12 The pseudo-code of OSFS 12

13 Time complexity of OSFS Depends on the number of independent tests. At time t, assuming V features are arriving, then the worstcase complexity is O( V BCF k BCF ) where k is the maximum allowable size that a conditioning set may grow. Assuming SF V, SF << V where SF contains all strongly relevant features, then the average time complexity is O( SF BCF k BCF ) at time t. 13

14 Time complexity of OSFS The most time-consuming part is the redundancy d analysis phase. When a new feature enters BCF, redundancy analysis will re-examine examine each feature within BCF with respect to its relevance to C. In order to further improve the selection efficiency, Fast-OSFS is designed on next page. 14

15 The Fast-OSFS Algorithm 15

16 The Fast-OSFS Algorithm The key difference is that t Fast-OSFS divides id the redundancy analysis phase into two phases inner-redundancy analysis and outer-redundancy analysis Fast-OSFS only alternates the relevance analysis and the inner-redundancy analysis phase In inner-redundancy analysis, Fast-OSFS only reexamines the feature just added into BCF In outer-redundancy redundancy analysis, it re-examines examines each feature of BCF only when the process of generating a feature is stopped. 16

17 Time complexity of Fast-OSFS The worst-case complexity is O( V k BCF + BCF k BCF ) The average is O( SF k BCF + BCF k BCF ) at time t. 17

18 6. Experimental Results Data sets: 8 UCI benchmark databases and 10 challenge databases Three classifiers: k-nn, J48 and Randomforest (Spider 2010), and selected the best accuracy as the result Grafting and Alpha-investing were performed using their original implementations. The tuning parameter λ for Grafting: selected using cross-validation The parameters of Alpha-investing: default settings, W 0 =0.5 and a Δ =0.5. The conditional independence tests in our implementation are G 2 tests and the parameter alpha is the statistical significance level. 18

19 Results on UCI Benchmark Data Sets 19

20 The win/tie/loss counts of our methods vs. other methods OSFS Fast-OSFS Grafting 5/1/2 4/0/4 Alphainvesting 7/0/1 5/1/2 Note: Alpha-investing selects all features on the Wdbc data. The compactness and predictive accuracy of 4 algorithms (alpha=0.01) 20

21 The win/tie/loss counts of our methods vs. other methods OSFS Fast-OSFS Grafting 3/2/3 4/1/3 Alphainvesting 7/0/1 8/0/0 Note: Alpha-investing selects all features on the Wdbc data. The compactness and predictive accuracy of 4 algorithms (alpha=0.05) 21

22 OSFS performance with different alpha values Fast-OSFS performance with different alpha values 22

23 An performance analysis with different alpha values When alpha is up to 0.05, our two algorithms tend to select more features, but the accuracy of them is different. OSFS degrades a little while Fast-OSFS improves a little. When alpha is equal to 0.01 and up to 0.05, two algorithms have similar performance in our experiments. 23

24 Results on Challenge Data Sets 24

25 Alphainvesting failed to select any features. Grafting The fails win/tie/loss to select any counts features of our on methods the Dorohthea vs. other and Breastcancer data because of the problem methods of out of memory OSFS Fast-OSFS Grafting 8/0/2 7/0/3 The compactness and prediction accuracy (%) of four algorithms (alpha=0.01) Alpha- 8/0/2 6/0/4 investing 25

26 Running Time Analysis The time reported is the normalized time: the running time of OSFS for a data set divided by the corresponding running time of Fast-OSFS. A greater normalized running time than one implies that OSFS is slower than Fast-OSFS on the same learning task. 26

27 Running Time Analysis On the UCI data sets, Fast-OSFS is at least twice faster than OSFS. Since the running time of Fast-OSFS and OSFS is less than one second on most of these data sets, we only report the running time longer than ten seconds on five data sets in Figure 8 (left: alpha=0.01; right: alpha=0.05). 27

28 Discussions: Grafting & Alpha-Investing Grafting: with a low dimensional data set, it is competitive with our methods; with a high dimensional data set, it is inferior to our methods. Its main drawback: it needs to choose the tuning parameter λ in advance. Alpha-investing: our algorithms outperform Alphainvesting on most of the 18 datasets. With prior knowledge of the structure of the candidate features, Alpha-investing could achieve good performance. If with prior knowledge, our framework can also deal with the task well. 28

29 Discussions: OSFS vs. Fast-OSFS Compactness: Fast-OSFS is competitive with OSFS Predictive accuracy: our empirical finding OSFS outperforms Fast-OSFS on datasets with a very small sample-to-variable ratio Fast-OSFS is superior to OSFS on datasets with a large sample. 29

30 Discussions: False Negatives To control false positives, two strategies: multiple comparisons and the parameter k. The parameter k is the maximum allowable size that a conditioning set may grow, and dis a key parameter. In online redundancy analysis, multiple statistical comparisons filter redundant features, and find all subsets from BCF to perform multiple tests, and the size of the maximum subset is k. Under the assumption that all independence tests are reliable, with a right value of k, the false positives will be well controlled. Thus,the experimental results show that our algorithms exhibit little sensitivity ii i to false positive ii features 30

31 Conclusion We have proposed a novel framework with two new algorithms to deal with streaming feature selection. Compared with two state-of-the-art algorithms Grafting and Alpha-investing, our algorithms have demonstrated more compactness and better accuracy in supervised learning on databases that contain many irrelevant and redundant features. 31

32 Future work In our experiments, we stimulated the feature set with an unknown but finite size. Explore how to dynamically assess the predictive accuracy with an infinite size, when reaching a certain threshold. Study the impact of stopping criteria i on the OSFS and Fast-OSFS algorithms. Apply online streaming feature selection to real Mars crater data, where craters are represented by thousands of texture-based features that call for efficient feature selection. 32

A Comparative Study of Filter-based Feature Ranking Techniques

A Comparative Study of Filter-based Feature Ranking Techniques Western Kentucky University From the SelectedWorks of Dr. Huanjing Wang August, 2010 A Comparative Study of Filter-based Feature Ranking Techniques Huanjing Wang, Western Kentucky University Taghi M. Khoshgoftaar,

More information

Predicting prokaryotic incubation times from genomic features Maeva Fincker - Final report

Predicting prokaryotic incubation times from genomic features Maeva Fincker - Final report Predicting prokaryotic incubation times from genomic features Maeva Fincker - mfincker@stanford.edu Final report Introduction We have barely scratched the surface when it comes to microbial diversity.

More information

Dynamic Cloud Resource Reservation via Cloud Brokerage

Dynamic Cloud Resource Reservation via Cloud Brokerage Dynamic Cloud Resource Reservation via Cloud Brokerage Wei Wang*, Di Niu +, Baochun Li*, Ben Liang* * Department of Electrical and Computer Engineering, University of Toronto + Department of Electrical

More information

User Profiling in an Ego Network: Co-profiling Attributes and Relationships

User Profiling in an Ego Network: Co-profiling Attributes and Relationships User Profiling in an Ego Network: Co-profiling Attributes and Relationships Rui Li, Chi Wang, Kevin Chen-Chuan Chang, {ruili1, chiwang1, kcchang}@illinois.edu Department of Computer Science, University

More information

Genomic Selection with Linear Models and Rank Aggregation

Genomic Selection with Linear Models and Rank Aggregation Genomic Selection with Linear Models and Rank Aggregation m.scutari@ucl.ac.uk Genetics Institute March 5th, 2012 Genomic Selection Genomic Selection Genomic Selection: an Overview Genomic selection (GS)

More information

Stochastic Fractal Search Algorithm for 3D Protein Structure Prediction Chuan SUN 1, Zi-qi WEI 2, Chang-jun ZHOU 1,* and Bin WANG 1

Stochastic Fractal Search Algorithm for 3D Protein Structure Prediction Chuan SUN 1, Zi-qi WEI 2, Chang-jun ZHOU 1,* and Bin WANG 1 206 International Conference on Artificial Intelligence and Computer Science (AICS 206 ISBN: 978--60595-4-0 Stochastic Fractal Search Algorithm for 3D Protein Structure Prediction Chuan SUN, Zi-qi WEI

More information

Genetic Algorithm with Upgrading Operator

Genetic Algorithm with Upgrading Operator Genetic Algorithm with Upgrading Operator NIDAPAN SUREERATTANAN Computer Science and Information Management, School of Advanced Technologies, Asian Institute of Technology, P.O. Box 4, Klong Luang, Pathumthani

More information

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Machine learning models can be used to predict which recommended content users will click on a given website.

More information

Anti-Money Laundering Solution Deep Dive WHITE PAPER

Anti-Money Laundering Solution Deep Dive WHITE PAPER Anti-Money Laundering Solution Deep Dive An AI-Driven Approach to AML Anti-Money Laundering (AML) is a particularly challenging area of regulation for banks even more so for large, geographically diverse

More information

Proactive Data Mining Using Decision Trees

Proactive Data Mining Using Decision Trees 2012 IEEE 27-th Convention of Electrical and Electronics Engineers in Israel Proactive Data Mining Using Decision Trees Haim Dahan and Oded Maimon Dept. of Industrial Engineering Tel-Aviv University Tel

More information

Dynamic Vehicle Routing and Dispatching

Dynamic Vehicle Routing and Dispatching Dynamic Vehicle Routing and Dispatching Jean-Yves Potvin Département d informatique et recherche opérationnelle and Centre interuniversitaire de recherche sur les réseaux d entreprise, la logistique et

More information

A Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG 1, Hong-Yu FU 1

A Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG 1, Hong-Yu FU 1 International Conference on Management Science and Management Innovation (MSMI 2014) A Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG

More information

Fault Detection of Large Amounts of Photovoltaic Systems

Fault Detection of Large Amounts of Photovoltaic Systems Fault Detection of Large Amounts of Photovoltaic Systems Patrick Traxler Software Competence Center Hagenberg, Austria patrick.traxler@scch.at Abstract. We study a model-based approach to detect sustainable

More information

Disentangling Prognostic and Predictive Biomarkers Through Mutual Information

Disentangling Prognostic and Predictive Biomarkers Through Mutual Information Informatics for Health: Connected Citizen-Led Wellness and Population Health R. Randell et al. (Eds.) 2017 European Federation for Medical Informatics (EFMI) and IOS Press. This article is published online

More information

Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification

Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification Senqiang Zhou Ke Wang School of Computing Science Simon Fraser University {szhoua@cs.sfu.ca, wangk@cs.sfu.ca}

More information

PROFITABLE ITEMSET MINING USING WEIGHTS

PROFITABLE ITEMSET MINING USING WEIGHTS PROFITABLE ITEMSET MINING USING WEIGHTS T.Lakshmi Surekha 1, Ch.Srilekha 2, G.Madhuri 3, Ch.Sujitha 4, G.Kusumanjali 5 1Assistant Professor, Department of IT, VR Siddhartha Engineering College, Andhra

More information

CSE 255 Lecture 3. Data Mining and Predictive Analytics. Supervised learning Classification

CSE 255 Lecture 3. Data Mining and Predictive Analytics. Supervised learning Classification CSE 255 Lecture 3 Data Mining and Predictive Analytics Supervised learning Classification Last week Last week we started looking at supervised learning problems Last week We studied linear regression,

More information

Whetstone An Accessible, Platform-Independent Method for Training Spiking Deep Neural Networks for Neuromorphic Processors

Whetstone An Accessible, Platform-Independent Method for Training Spiking Deep Neural Networks for Neuromorphic Processors Whetstone An Accessible, Platform-Independent Method for Training Spiking Deep Neural Networks for Neuromorphic Processors W i l l i a m M. S e v e r a *, C r a i g M. V i n e y a r d, R y a n D e l l

More information

Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources

Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources Ying Yang, Xindong Wu, and Xingquan Zhu Department of Computer Science, University of Vermont, Burlington VT 05405, USA {yyang,

More information

Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources

Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources Ying Yang, Xindong Wu, and Xingquan Zhu Department of Computer Science, University of Vermont, Burlington VT 05405, USA {yyang,xwu,xqzhu}@cs.uvm.edu

More information

CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT

CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT 8.1 Introduction Customer Relationship Management (CRM) is a process that manages the interactions between a company and its customers.

More information

Inferring Social Ties across Heterogeneous Networks

Inferring Social Ties across Heterogeneous Networks Inferring Social Ties across Heterogeneous Networks CS 6001 Complex Network Structures HARISH ANANDAN Introduction Social Ties Information carrying connections between people It can be: Strong, weak or

More information

Introduction to Reinforcement Learning. CS : Deep Reinforcement Learning Sergey Levine

Introduction to Reinforcement Learning. CS : Deep Reinforcement Learning Sergey Levine Introduction to Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 1 is due next Wednesday! Remember that Monday is a holiday, so no office hours 2. Remember

More information

Intro Logistic Regression Gradient Descent + SGD

Intro Logistic Regression Gradient Descent + SGD Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 29, 2016 1 Ad Placement

More information

Predicting Credit Card Customer Loyalty Using Artificial Neural Networks

Predicting Credit Card Customer Loyalty Using Artificial Neural Networks Predicting Credit Card Customer Loyalty Using Artificial Neural Networks Tao Zhang Bo Yuan Wenhuang Liu Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, P.R. China E-Mail: kirasz06@gmail.com,

More information

Mining the Situation: Spatiotemporal Traffic Prediction With Big Data

Mining the Situation: Spatiotemporal Traffic Prediction With Big Data 702 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 9, NO. 4, JUNE 2015 Mining the Situation: Spatiotemporal Traffic Prediction With Big Data Jie Xu, Dingxiong Deng, Ugur Demiryurek, Cyrus Shahabi,

More information

Automatic Facial Expression Recognition

Automatic Facial Expression Recognition Automatic Facial Expression Recognition Huchuan Lu, Pei Wu, Hui Lin, Deli Yang School of Electronic and Information Engineering, Dalian University of Technology Dalian, Liaoning Province, China lhchuan@dlut.edu.cn

More information

Combinational Collaborative Filtering: An Approach For Personalised, Contextually Relevant Product Recommendation Baskets

Combinational Collaborative Filtering: An Approach For Personalised, Contextually Relevant Product Recommendation Baskets Combinational Collaborative Filtering: An Approach For Personalised, Contextually Relevant Product Recommendation Baskets Research Project - Jai Chopra (338852) Dr Wei Wang (Supervisor) Dr Yifang Sun (Assessor)

More information

User Behavior Recovery via Hidden Markov Models Analysis

User Behavior Recovery via Hidden Markov Models Analysis User Behavior Recovery via Hidden Markov Models Analysis Alina Maor, Doron Shaked Hewlett Packard Labs HPE-2016-62 Keyword(s): Hidden Markov Model; predictive analytics; classification; statistical methods;

More information

Predict Commercial Promoted Contents Will Be Clicked By User

Predict Commercial Promoted Contents Will Be Clicked By User Predict Commercial Promoted Contents Will Be Clicked By User Gary(Xinran) Guo garyguo@stanford.edu SUNetID: garyguo Stanford University 1. Introduction As e-commerce, social media grows rapidly, advertisements

More information

CS6716 Pattern Recognition

CS6716 Pattern Recognition CS6716 Pattern Recognition Aaron Bobick School of Interactive Computing Administrivia Shray says the problem set is close to done Today chapter 15 of the Hastie book. Very few slides brought to you by

More information

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions Near-Balanced Incomplete Block Designs with An Application to Poster Competitions arxiv:1806.00034v1 [stat.ap] 31 May 2018 Xiaoyue Niu and James L. Rosenberger Department of Statistics, The Pennsylvania

More information

A HYBRID MODERN AND CLASSICAL ALGORITHM FOR INDONESIAN ELECTRICITY DEMAND FORECASTING

A HYBRID MODERN AND CLASSICAL ALGORITHM FOR INDONESIAN ELECTRICITY DEMAND FORECASTING A HYBRID MODERN AND CLASSICAL ALGORITHM FOR INDONESIAN ELECTRICITY DEMAND FORECASTING Wahab Musa Department of Electrical Engineering, Universitas Negeri Gorontalo, Kota Gorontalo, Indonesia E-Mail: wmusa@ung.ac.id

More information

Preface to the third edition Preface to the first edition Acknowledgments

Preface to the third edition Preface to the first edition Acknowledgments Contents Foreword Preface to the third edition Preface to the first edition Acknowledgments Part I PRELIMINARIES XXI XXIII XXVII XXIX CHAPTER 1 Introduction 3 1.1 What Is Business Analytics?................

More information

Filter-Wrapper based Feature Ranking Technique for Dynamic Software Quality Attributes

Filter-Wrapper based Feature Ranking Technique for Dynamic Software Quality Attributes Filter-Wrapper based Feature Ranking Technique for Dynamic Software Quality Attributes Siti Sakira Kamaruddin 1, Jamaiah Yahaya 2 Aziz Deraman 3, and Ruzita Ahmad 4 1 Universiti Utara Malaysia, Malaysia,

More information

COMBINED-OBJECTIVE OPTIMIZATION IN IDENTICAL PARALLEL MACHINE SCHEDULING PROBLEM USING PSO

COMBINED-OBJECTIVE OPTIMIZATION IN IDENTICAL PARALLEL MACHINE SCHEDULING PROBLEM USING PSO COMBINED-OBJECTIVE OPTIMIZATION IN IDENTICAL PARALLEL MACHINE SCHEDULING PROBLEM USING PSO Bathrinath S. 1, Saravanasankar S. 1 and Ponnambalam SG. 2 1 Department of Mechanical Engineering, Kalasalingam

More information

Accurate Campaign Targeting Using Classification Algorithms

Accurate Campaign Targeting Using Classification Algorithms Accurate Campaign Targeting Using Classification Algorithms Jieming Wei Sharon Zhang Introduction Many organizations prospect for loyal supporters and donors by sending direct mail appeals. This is an

More information

Steering Information Diffusion Dynamically against User Attention Limitation

Steering Information Diffusion Dynamically against User Attention Limitation Steering Information Diffusion Dynamically against User Attention Limitation Shuyang Lin, Qingbo Hu, Fengjiao Wang, Philip S.Yu Department of Computer Science University of Illinois at Chicago Chicago,

More information

arxiv: v3 [cs.cv] 29 Aug 2017

arxiv: v3 [cs.cv] 29 Aug 2017 Accurate Pulmonary Nodule Detection in Computed Tomography Images Using Deep Convolutional Neural Networks arxiv:1706.04303v3 [cs.cv] 29 Aug 2017 Jia Ding, Aoxue Li, Zhiqiang Hu and Liwei Wang School of

More information

WE consider the general ranking problem, where a computer

WE consider the general ranking problem, where a computer 5140 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 11, NOVEMBER 2008 Statistical Analysis of Bayes Optimal Subset Ranking David Cossock and Tong Zhang Abstract The ranking problem has become increasingly

More information

Water Treatment Plant Decision Making Using Rough Multiple Classifier Systems

Water Treatment Plant Decision Making Using Rough Multiple Classifier Systems Available online at www.sciencedirect.com Procedia Environmental Sciences 11 (2011) 1419 1423 Water Treatment Plant Decision Making Using Rough Multiple Classifier Systems Lin Feng 1, 2 1 College of Computer

More information

SINCE the introduction of cellular automata (CA) in [1],

SINCE the introduction of cellular automata (CA) in [1], IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1 On Routine Evolution of Complex Cellular Automata Michal Bidlo Abstract The paper deals with a special technique, called conditionally matching rules, for

More information

GA-SVM WRAPPER APPROACH FOR GENE RANKING AND CLASSIFICATION USING EXPRESSIONS OF VERY FEW GENES

GA-SVM WRAPPER APPROACH FOR GENE RANKING AND CLASSIFICATION USING EXPRESSIONS OF VERY FEW GENES GA-SVM WRAPPER APPROACH FOR GENE RANKING AND CLASSIFICATION USING EXPRESSIONS OF VERY FEW GENES N.REVATHY 1, Dr.R.BALASUBRAMANIAN 2 1 Assistant Professor, Department of Computer Applications, Karpagam

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/5/18 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 High dim. data Graph data Infinite data Machine

More information

Prediction of Personalized Rating by Combining Bandwagon Effect and Social Group Opinion: using Hadoop-Spark Framework

Prediction of Personalized Rating by Combining Bandwagon Effect and Social Group Opinion: using Hadoop-Spark Framework Prediction of Personalized Rating by Combining Bandwagon Effect and Social Group Opinion: using Hadoop-Spark Framework Lu Sun 1, Kiejin Park 2 and Limei Peng 1 1 Department of Industrial Engineering, Ajou

More information

Time Optimal Profit Maximization in a Social Network

Time Optimal Profit Maximization in a Social Network Time Optimal Profit Maximization in a Social Network Yong Liu, Wei Zhang Hei Long Jiang University China liuyong001@hlju.edu.cn 149275442@qq.com ABSTRACT: Influence maximization aims to seek k nodes from

More information

Correcting Sampling Bias in Structural Genomics through Iterative Selection of Underrepresented Targets

Correcting Sampling Bias in Structural Genomics through Iterative Selection of Underrepresented Targets Correcting Sampling Bias in Structural Genomics through Iterative Selection of Underrepresented Targets Kang Peng Slobodan Vucetic Zoran Obradovic Abstract In this study we proposed an iterative procedure

More information

Boundedly Rational Consumers

Boundedly Rational Consumers Boundedly Rational Consumers Marco VALENTE 1 1 LEM, S. Anna School of Advanced Studies, Pisa University of L Aquila Background Mainstream: consumers behaviour is represented by maximisation of a utility

More information

A Semi-automated Peer-review System Bradly Alicea Orthogonal Research

A Semi-automated Peer-review System Bradly Alicea Orthogonal Research A Semi-automated Peer-review System Bradly Alicea bradly.alicea@ieee.org Orthogonal Research Abstract A semi-supervised model of peer review is introduced that is intended to overcome the bias and incompleteness

More information

An Implementation of genetic algorithm based feature selection approach over medical datasets

An Implementation of genetic algorithm based feature selection approach over medical datasets An Implementation of genetic algorithm based feature selection approach over medical s Dr. A. Shaik Abdul Khadir #1, K. Mohamed Amanullah #2 #1 Research Department of Computer Science, KhadirMohideen College,

More information

2 Analysts general forecast effort as determinant of earnings forecast

2 Analysts general forecast effort as determinant of earnings forecast 2 Analysts general forecast effort as determinant of earnings forecast accuracy In this chapter, I introduce a new variable to measure the forecast effort an analyst devotes when making earnings forecasts.

More information

CFSSP: Chou and Fasman Secondary Structure Prediction server

CFSSP: Chou and Fasman Secondary Structure Prediction server Wide Spectrum, Vol. 1, No. 9, (2013) pp 15-19 CFSSP: Chou and Fasman Secondary Structure Prediction server T. Ashok Kumar Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil

More information

Feature Selection in Pharmacogenetics

Feature Selection in Pharmacogenetics Feature Selection in Pharmacogenetics Application to Calcium Channel Blockers in Hypertension Treatment IEEE CIS June 2006 Dr. Troy Bremer Prediction Sciences Pharmacogenetics Great potential SNPs (Single

More information

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras Week 05 Lecture 19 Priority Based Scheduling Algorithms So

More information

Correlation and Instance Based Feature Selection for Electricity Load Forecasting

Correlation and Instance Based Feature Selection for Electricity Load Forecasting Correlation and Instance Based Feature Selection for Electricity Load Forecasting Irena Koprinska a, Mashud Rana a, Vassilios G. Agelidis b a School of Information Technologies, University of Sydney, Sydney,

More information

A Memory Enhanced Evolutionary Algorithm for Dynamic Scheduling Problems

A Memory Enhanced Evolutionary Algorithm for Dynamic Scheduling Problems A Memory Enhanced Evolutionary Algorithm for Dynamic Scheduling Problems Gregory J. Barlow and Stephen F. Smith Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA gjb@cmu.edu, sfs@cs.cmu.edu

More information

CONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS

CONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS CONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS Darie MOLDOVAN, PhD * Mircea RUSU, PhD student ** Abstract The objective of this paper is to demonstrate the utility

More information

Auctioning Experts in Credit Modeling

Auctioning Experts in Credit Modeling Auctioning Experts in Credit Modeling Robert Stine The School, Univ of Pennsylvania May, 2004 www-stat.wharton.upenn.edu/~stine Opportunities Anticipate default - Who are most likely to default in the

More information

Conclusions and Future Work

Conclusions and Future Work Chapter 9 Conclusions and Future Work Having done the exhaustive study of recommender systems belonging to various domains, stock market prediction systems, social resource recommender, tag recommender

More information

Bioinformatics : Gene Expression Data Analysis

Bioinformatics : Gene Expression Data Analysis 05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used

More information

Representation in Supervised Machine Learning Application to Biological Problems

Representation in Supervised Machine Learning Application to Biological Problems Representation in Supervised Machine Learning Application to Biological Problems Frank Lab Howard Hughes Medical Institute & Columbia University 2010 Robert Howard Langlois Hughes Medical Institute What

More information

A logistic regression model for Semantic Web service matchmaking

A logistic regression model for Semantic Web service matchmaking . BRIEF REPORT. SCIENCE CHINA Information Sciences July 2012 Vol. 55 No. 7: 1715 1720 doi: 10.1007/s11432-012-4591-x A logistic regression model for Semantic Web service matchmaking WEI DengPing 1*, WANG

More information

Simulated Annealing Algorithm for Vehicle Routing Problem with Transshipment

Simulated Annealing Algorithm for Vehicle Routing Problem with Transshipment Simulated Annealing Algorithm for Vehicle Routing Problem with Transshipment Sukanya Thonethong #1, Jirachai Buddhakulsomsiri #2 # Logistics and Supply Chain Systems Engineering Sirindhorn International

More information

DATA SEGMENTATION POWERHOUSE MICHELLE LELEMPSIS, CAUSEIS. Michelle Lelempsis Lead Solutions Consultant

DATA SEGMENTATION POWERHOUSE MICHELLE LELEMPSIS, CAUSEIS. Michelle Lelempsis Lead Solutions Consultant MICHELLE LELEMPSIS, CAUSEIS YOUR PRESENTER I LOVE DATA Michelle Lelempsis Lead Solutions Consultant Worked with imis for over 12 years Data Analyst by default! Help organisations drive decisions through

More information

Preprocessing Technique for Discrimination Prevention in Data Mining

Preprocessing Technique for Discrimination Prevention in Data Mining The International Journal Of Engineering And Science (IJES) Volume 3 Issue 6 Pages 12-16 2014 ISSN (e): 2319 1813 ISSN (p): 2319 1805 Preprocessing Technique for Discrimination Prevention in Data Mining

More information

Financial Time Series Segmentation Based On Turning Points

Financial Time Series Segmentation Based On Turning Points Proceedings of 2011 International Conference on System Science and Engineering, Macau, China - June 2011 Financial Time Series Segmentation Based On Turning Points Jiangling Yin, Yain-Whar Si, Zhiguo Gong

More information

Airbnb Price Estimation. Hoormazd Rezaei SUNet ID: hoormazd. Project Category: General Machine Learning gitlab.com/hoorir/cs229-project.

Airbnb Price Estimation. Hoormazd Rezaei SUNet ID: hoormazd. Project Category: General Machine Learning gitlab.com/hoorir/cs229-project. Airbnb Price Estimation Liubov Nikolenko SUNet ID: liubov Hoormazd Rezaei SUNet ID: hoormazd Pouya Rezazadeh SUNet ID: pouyar Project Category: General Machine Learning gitlab.com/hoorir/cs229-project.git

More information

Association rules model of e-banking services

Association rules model of e-banking services Association rules model of e-banking services V. Aggelis Department of Computer Engineering and Informatics, University of Patras, Greece Abstract The introduction of data mining methods in the banking

More information

Approximate Safety Enforcement Using Computed Viability Envelopes

Approximate Safety Enforcement Using Computed Viability Envelopes Approximate Safety Enforcement Using Computed Viability Envelopes Maciej Kalisiak University of Toronto Michiel van de Panne University of British Columbia IEEE International

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/8/2015 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 High dim. data Graph data Infinite data Machine

More information

Mathematical Modeling and Analysis of Finite Queueing System with Unreliable Single Server

Mathematical Modeling and Analysis of Finite Queueing System with Unreliable Single Server IOSR Journal of Mathematics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 12, Issue 3 Ver. VII (May. - Jun. 2016), PP 08-14 www.iosrjournals.org Mathematical Modeling and Analysis of Finite Queueing

More information

QoS-based Scheduling for Task Management in Grid Computing

QoS-based Scheduling for Task Management in Grid Computing QoS-based Scheduling for Task Management in Grid Computing Xiaohong Huang 1, Maode Ma 2, Yan Ma 1 Abstract--Due to the heterogeneity, complexity, and autonomy of wide spread Grid resources, the dynamic

More information

Association rules model of e-banking services

Association rules model of e-banking services In 5 th International Conference on Data Mining, Text Mining and their Business Applications, 2004 Association rules model of e-banking services Vasilis Aggelis Department of Computer Engineering and Informatics,

More information

Consolidated Report TAUS DQF Aggregated Enterprise Solution

Consolidated Report TAUS DQF Aggregated Enterprise Solution Consolidated Report TAUS DQF Aggregated Enterprise Solution Survey results and summary of feedback gained in April 2016 from 16 companies that participated in the consultation. Content 1. Introduction

More information

Big Data. Methodological issues in using Big Data for Official Statistics

Big Data. Methodological issues in using Big Data for Official Statistics Giulio Barcaroli Istat (barcarol@istat.it) Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Methodological issues in using Big Data for Official Statistics

More information

Predicting Corporate Influence Cascades In Health Care Communities

Predicting Corporate Influence Cascades In Health Care Communities Predicting Corporate Influence Cascades In Health Care Communities Shouzhong Shi, Chaudary Zeeshan Arif, Sarah Tran December 11, 2015 Part A Introduction The standard model of drug prescription choice

More information

Modeling of competition in revenue management Petr Fiala 1

Modeling of competition in revenue management Petr Fiala 1 Modeling of competition in revenue management Petr Fiala 1 Abstract. Revenue management (RM) is the art and science of predicting consumer behavior and optimizing price and product availability to maximize

More information

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,

More information

A COMPUTATIONALLY EFFICIENT AND SCALABLE SHELF LIFE ESTIMATION MODEL FOR WIRELESS TEMPERATURE SENSORS IN THE SUPPLY CHAIN

A COMPUTATIONALLY EFFICIENT AND SCALABLE SHELF LIFE ESTIMATION MODEL FOR WIRELESS TEMPERATURE SENSORS IN THE SUPPLY CHAIN A COMPUTATIONALLY EFFICIENT AND SCALABLE SHELF LIFE ESTIMATION MODEL FOR WIRELESS TEMPERATURE SENSORS IN THE SUPPLY CHAIN Ismail Uysal (a), Jean-Pierre Emond (b), Gisele Bennett (c) (a,b) College of Technology

More information

Case studies in Data Mining & Knowledge Discovery

Case studies in Data Mining & Knowledge Discovery Case studies in Data Mining & Knowledge Discovery Knowledge Discovery is a process Data Mining is just a step of a (potentially) complex sequence of tasks KDD Process Data Mining & Knowledge Discovery

More information

Optimal Dynamic Pricing of Perishable Products Considering Quantity Discount Policy

Optimal Dynamic Pricing of Perishable Products Considering Quantity Discount Policy Journal of Information & Computational Science 10:14 (013) 4495 4504 September 0, 013 Available at http://www.joics.com Optimal Dynamic Pricing of Perishable Products Considering Quantity Discount Policy

More information

Predicting ratings of peer-generated content with personalized metrics

Predicting ratings of peer-generated content with personalized metrics Predicting ratings of peer-generated content with personalized metrics Project report Tyler Casey tyler.casey09@gmail.com Marius Lazer mlazer@stanford.edu [Group #40] Ashish Mathew amathew9@stanford.edu

More information

A Simple EOQ-like Solution to an Inventory System with Compound Poisson and Deterministic Demand

A Simple EOQ-like Solution to an Inventory System with Compound Poisson and Deterministic Demand A Simple EOQ-like Solution to an Inventory System with Compound Poisson and Deterministic Demand Katy S. Azoury San Francisco State University, San Francisco, California, USA Julia Miyaoka* San Francisco

More information

A Novel Splice Site Prediction Method using Support Vector Machine

A Novel Splice Site Prediction Method using Support Vector Machine Journal of Computational Information Systems 9: 20 (2013) 8053 8060 Available at http://www.jofcis.com A Novel Splice Site Prediction Method using Support Vector Machine Dan WEI 1,2, Huiling ZHANG 2, Yanjie

More information

Fraud Detection for MCC Manipulation

Fraud Detection for MCC Manipulation 2016 International Conference on Informatics, Management Engineering and Industrial Application (IMEIA 2016) ISBN: 978-1-60595-345-8 Fraud Detection for MCC Manipulation Hong-feng CHAI 1, Xin LIU 2, Yan-jun

More information

Single Machine Scheduling with Interfering Job Sets

Single Machine Scheduling with Interfering Job Sets Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 009) 0- August 009, Dublin, Ireland MISTA 009 Single Machine Scheduling with Interfering Job Sets Ketan Khowala,

More information

Optimal Demand Response Using Device Based Reinforcement Learning

Optimal Demand Response Using Device Based Reinforcement Learning Optimal Demand Response Using Device Based Reinforcement Learning Zheng Wen 1 Joint Work with Hamid Reza Maei 1 and Dan O Neill 1 1 Department of Electrical Engineering Stanford University zhengwen@stanford.edu

More information

The Mystery Behind Project Management Metrics. Reed Shell Blue Hippo Consulting

The Mystery Behind Project Management Metrics. Reed Shell Blue Hippo Consulting The Mystery Behind Project Management Metrics Reed Shell Blue Hippo Consulting Presentation Take-Aways Two Tools for gathering and producing metrics 10 Step Process Goal/Question/Metric Deliverable Exercises

More information

Comparison of Different Independent Component Analysis Algorithms for Sales Forecasting

Comparison of Different Independent Component Analysis Algorithms for Sales Forecasting International Journal of Humanities Management Sciences IJHMS Volume 2, Issue 1 2014 ISSN 2320 4044 Online Comparison of Different Independent Component Analysis Algorithms for Sales Forecasting Wensheng

More information

What about streaming data?

What about streaming data? What about streaming data? 1 The Stream Model Data enters at a rapid rate from one or more input ports Such data are called stream tuples The system cannot store the entire (infinite) stream Distribution

More information

IMPOSSIBILITY OF CONSENSUS

IMPOSSIBILITY OF CONSENSUS IMPOSSIBILITY OF CONSENSUS Fall 2012 Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1} Asynchronous,

More information

Fixed vs. Self-Adaptive Crossover-First Differential Evolution

Fixed vs. Self-Adaptive Crossover-First Differential Evolution Applied Mathematical Sciences, Vol. 10, 2016, no. 32, 1603-1610 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2016.6377 Fixed vs. Self-Adaptive Crossover-First Differential Evolution Jason

More information

A NOTE ON METHODS OF ESTIMATING REGIONAL INPUT-OUTPUT TABLES: CAN THE FLQ IMPROVE THE RAS ALGORITHM?

A NOTE ON METHODS OF ESTIMATING REGIONAL INPUT-OUTPUT TABLES: CAN THE FLQ IMPROVE THE RAS ALGORITHM? A NOTE ON METHODS OF ESTIMATING REGIONAL INPUT-OUTPUT TABLES: CAN THE FLQ IMPROVE THE RAS ALGORITHM? Steven Brand July 2012 Author contact details: Plymouth University Drake s Circus Plymouth PL4 8AA sbrand@plymouth.ac.uk

More information

ALLOCATING SHARED RESOURCES OPTIMALLY FOR CALL CENTER OPERATIONS AND KNOWLEDGE MANAGEMENT ACTIVITIES

ALLOCATING SHARED RESOURCES OPTIMALLY FOR CALL CENTER OPERATIONS AND KNOWLEDGE MANAGEMENT ACTIVITIES ALLOCATING SHARED RESOURCES OPTIMALLY FOR CALL CENTER OPERATIONS AND KNOWLEDGE MANAGEMENT ACTIVITIES Research-in-Progress Abhijeet Ghoshal Alok Gupta University of Minnesota University of Minnesota 321,

More information

Information Effects on Performance of Two-tier Service Systems with Strategic Customers. Zhe George Zhang

Information Effects on Performance of Two-tier Service Systems with Strategic Customers. Zhe George Zhang Information Effects on Performance of Two-tier Service Systems with Strategic Customers Zhe George Zhang Department of Decision Sciences, Western Washington University, Bellingham, WA98225, USA & Beedie

More information

CHAPTER 4 PROPOSED HYBRID INTELLIGENT APPROCH FOR MULTIPROCESSOR SCHEDULING

CHAPTER 4 PROPOSED HYBRID INTELLIGENT APPROCH FOR MULTIPROCESSOR SCHEDULING 79 CHAPTER 4 PROPOSED HYBRID INTELLIGENT APPROCH FOR MULTIPROCESSOR SCHEDULING The present chapter proposes a hybrid intelligent approach (IPSO-AIS) using Improved Particle Swarm Optimization (IPSO) with

More information

Cross Validation and MLP Architecture Selection

Cross Validation and MLP Architecture Selection Cross Validation and MLP Architecture Selection Tim Andersen and Tony Martinez tim@axon.cs.byu.edu, martinez@cs.byu.edu Computer Science Department, Brigham Young University Abstract The performance of

More information

Chapter 5 Evaluating Classification & Predictive Performance

Chapter 5 Evaluating Classification & Predictive Performance Chapter 5 Evaluating Classification & Predictive Performance Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Why Evaluate? Multiple methods are available

More information