Hybrid Intelligent Systems for DNA Microarray Data Analysis

Size: px

Start display at page:

Download "Hybrid Intelligent Systems for DNA Microarray Data Analysis"

Calvin Houston
5 years ago
Views:

1 Hybrid Intelligent Systems for DNA Microarray Data Analysis November 27, 2007 Sung-Bae Cho Computer Science Department, Yonsei University Soft Computing Lab

2 What do I think with Bioinformatics? Biological Objects Cause Function Blackbox Disease Identification modeling Expression Data clustering classification Predict Cancer (Classify Disease) Drug Design (Personal Medicine) Identify Risk Factors optimal features & classifiers ensemble approach S.-B. Cho, Soft Computing Lab 2

3 Acknowledgements Bioinformatics team members (including OB s) C.-H. Park, K.-J. Kim, J.-H. Hong, H.-S. Park, S.-H. Yoo, H.-H. Won, J. Ryu, and H.-J. Kwon Soft Computing Lab

4 Outline Overview of DNA microarray technology Classification Comprehensive comparisons Ensemble approaches S.-B. Cho, Soft Computing Lab 4

5 DNA Microarray Technology Soft Computing Lab

6 Data Mining in Biological Data cells in human body 3*10 9 letters in DNA code in every cell in human body Only 0.2% differ between humans Human DNA is 98% identical to that of chimpanzees 97% of human DNA has no known function Bioinformatics Solving problems arising from biology using methodology from computer science Drug design, identification of risk factors, personal medicine, etc. Related topics Classification, clustering, gene modeling, gene identification S.-B. Cho, Soft Computing Lab 6

7 New Paradigm in Biology Microarray Technology One Gene Analysis Very Slow Local Analysis Thousands Gene Analysis Very Fast Global Analysis Need Computational Method Machine Learning S.-B. Cho, Soft Computing Lab 7

8 Overview DNA Microarray DNA microarray A chip or slide that has been printed with a large number of DNA spots DNA microarray technology Enables the simultaneous analysis of thousands of gene expression levels for genetic and genomic research and for diagnostics Gene : sequence of DNA that includes genetic information Two major techniques Hybridization method cdna microarray/ Oligonucleotide microarray Sequencing method Serial analysis of gene expression (SAGE) S.-B. Cho, Soft Computing Lab 8

Data Acquisition DNA Microarray samples samples

matrix (numbers) Microarray data consist of large

9 Data Acquisition DNA Microarray samples samples sample 1 sample 2 sample 3 genes genes log 2 Int( Cy5) Int( Cy3) microarray image accumulated microarray image (colors) gene expression data matrix (numbers) Microarray data consist of large number of genes in small samples!! S.-B. Cho, Soft Computing Lab 9

10 Example DNA Microarray A part of Leukemia dataset, before log transformation (Golub, et al., 1999) sample Gene Description Gene Accession Number AML AML ALL AML AML ALL GB DEF = BAC clone RG293F11 from 7q21-7q22, complete sequence AC000066_at Metabotropic glutamate receptor 8 mrna AC000099_at WUGSC:H_GS188P18.1a gene extracted from Human BAC clone GS188P18 A-589H1.1 from Homo sapiens Chromosome 16 BAC clone CIT987-SKA-589H1 ~complete genomic sequence, complete sequence./ntype=dna /annot=mrna WUGSC:DJ515N1.2 gene extracted from Human PAC clone DJ515N1 from 22q11.2-q22 GUANINE NUCLEOTIDE-BINDING PROTEIN G(T), ALPHA-1 SUBUNIT GB DEF = PAC clone DJ525N14 from Xq23, complete sequence COX6B gene (COXG) extracted from Human DNA from overlapping chromosome 19 cosmids R31396, F25451, and R31076 containing COX6B and UPKA, genomic sequence F25451_3 gene extracted from Human DNA from overlapping chromosome 19 cosmids R31396, F25451, and R31076 containing COX6B and UPKA, genomic sequence UPKA gene extracted from Human DNA from overlapping chromosome 19 cosmids R31396, F25451, and R31076 containing COX6B and UPKA, genomic sequence AC000115_cds1_at AC002045_xpt1_at gene AC002073_cds1_at AC002077_at AC002086_at AC002115_cds1_at AC002115_cds3_at AC002115_cds4_at S.-B. Cho, Soft Computing Lab 10

11 Two Types of Data DNA Microarray Single time point in different states States : disease or tumor type Goal : classifying samples using informative genes Can be used for gene identification Feature selection/extraction Classification problem Monitoring each gene in multiple times Time series data Goal : identifying functionally related genes Can be used for gene regulatory network Clustering problem S.-B. Cho, Soft Computing Lab 11

12 Challenges DNA Microarray Noise Microarray data contain a high level of noise due to experimental procedures The labeling of cdna and the scanning of the slides frequently show non-linear characteristics Sparseness Microarray data are sparse Several thousands of genes are monitored, while the number of samples is often restricted to hundreds or less High redundancy Many genes are highly correlated, which leads to redundancy in the data Adding coexpressed genes to the classification system does not increase information for the system S.-B. Cho, Soft Computing Lab 12

13 Classification Comprehensive comparisons Ensemble approaches Soft Computing Lab

14 Motivation Many researchers have been studying many problems of cancer classification using gene expression profiles and attempting to propose the optimal classification technique to work out these problems We need a thorough effort to give the evaluation of the possible methods to solve the problems of analyzing gene expression data There are several microarray datasets leukemia cancer dataset, colon cancer dataset, lymphoma dataset, breast cancer dataset, NCI60 dataset, and ovarian cancer dataset Three datasets for our study Leukemia cancer dataset Colon cancer dataset Lymphoma cancer dataset S.-B. Cho, Soft Computing Lab 14

15 Classification Scheme DNA microarray data Selected features Class 1 Class 2 Feature selection Classification S.-B. Cho, Soft Computing Lab 15

16 Overview Feature Selection Selecting informative features appropriate to specific goal Variable selection/ gene selection Microarray data consist of large number of genes in small samples All genes are not needed for classification It is essential to select some genes highly related with particular classes for classification, which is called informative genes (Golub et al., 1999) Many selection/extraction techniques based on measures Correlation-based measures Similarity-based measures Information theory-based measures Principal component analysis S.-B. Cho, Soft Computing Lab 16

17 Top 50 Genes Selected Feature Selection Leukemia dataset PC Pearson's Correlation Gene ALL AML Sample 0 S.-B. Cho, Soft Computing Lab 17

18 Rank-based Selection Feature Selection Representative feature selection method Gene selection according to the significance order of each gene Gene number Significance Gene Gene Gene Gene Selecting order Gene 3 Gene 2 Gene 4 Gene 1 How can we calculate the significance? S.-B. Cho, Soft Computing Lab 18

19 Correlation Measures Feature Selection Measuring how much each gene is correlated with the class g ideal = (0, 0, 0,, 1, 1, 1) class pattern class 1 class 2 Pearson correlation coefficients (PC) Parametric Spearman correlation coefficients (SC) Non-parametric Feature 2 Feature Negative correlation Feature 1 Positive correlation Feature 1 No correlation S.-B. Cho, Soft Computing Lab 19

20 Similarity Measures Feature Selection Calculating geometrical similarity between ideal gene vector and each gene vector Euclidean distance (ED) Geometric distance Cosine coefficient (CC) Difference of direction d θ S.-B. Cho, Soft Computing Lab 20

21 Information Theoretic Measures Feature Selection Measuring feature-goodness based on the frequency of the feature satisfying condition Q (whether genes are induced or not) Using frequency or mean and standard deviation of data to calculate the significance of genes Information gain (IG) Mutual information (MI) Signal to noise ratio (SN) µ 1 µ 2 σ 2 σ 1 µ 2 µ 1 S.-B. Cho, Soft Computing Lab 21

22 S.-B. Cho, Soft Computing Lab 22 Mathematical Definitions ) ( ) ( ) ( ) ( ), ( ) ( ) ( log ) ( ) ( log ) ( ) ( log ) ( 1) ( ) ( 6 1 ) ) ( )( ) ( ( cos g g g g c g P C A B A A MI D B B A B B C A B A A A IG Y X XY r Y X r N N Dy Dx r N Y Y N X X N Y X XY r ine euclidean spearman pearson σ σ µ µ + = + + = = = = = = Pearson s correlation coefficient (PC) Euclidean distance (ED) Spearman s correlation coefficient (SC) Cosine coefficient (CC) Information gain (IG) Mutual information (MI) Signal to noise ratio (SN) Feature Selection

23 Principal Component Analysis Feature Selection Widely used for dimensionality reduction Given N vectors in k-dimension, find c (<= k) orthogonal vectors that can be best used to represent data The original data set is reduced to one consisting of N vectors on c principal components (reduced dimensions) Each vector is a linear combination of the c principal components Principal components are directions of variance from the highest The first principal component (PC) is the direction of maximum variance, the second is that of the next highest variance, etc t ij = n k = 1 p ik m kj n : the number of significant principal components pik : the score of sample i on component k mkj : the loading on component k of variable j S.-B. Cho, Soft Computing Lab 23

24 Overview Classifier Supervised learning Need reliable and precise classification essential for successful cancer treatment Current methods for classifying human malignancies rely on a variety of morphological, clinical and molecular variables Uncertainties in diagnosis remain; likely that existing classes are heterogeneous Characterize molecular variations among tumors by monitoring gene expression (microarray) Hope: microarrays will lead to more reliable tumor classification (and therefore more appropriate treatments and better outcomes) Class 1 Decision boundary Class 2 S.-B. Cho, Soft Computing Lab 24

25 Classifiers Classifier Multilayer perceptron K-nearest neighbor Support vector machine Decision tree Structure adaptive self-organizing map S.-B. Cho, Soft Computing Lab 25

26 Multilayer Perceptron Classifier Updating the weights recursively in order to minimize errors occurred on layer using desired output Local for updating the synaptic weights and biases Efficient for computing all the partial derivatives of the cost function with respect to these free parameters x 1 x 2 w 11 w 21 x 3 o 1 o 2 x N w KN Input layer Hidden layer Output layer S.-B. Cho, Soft Computing Lab 26

27 K-Nearest Neighbor Classifier One of the most common methods in memory based induction Deciding the labels of k known data based on similarities with known exemplars P( X, c j ) = Sim( X, d di knn i ) P( d i, c j ) b j Sim(X, d i ) : Pearson s correlation similarity function k : # of neighbors b j : a bias term S.-B. Cho, Soft Computing Lab 27

28 Support Vector Machine Classifier Introduced by Vapnik in 1995 Constructing a hyperplane as the decision surface in such a way that the margin of separation between positive and negative examples is maximized Given a labeled set of M training samples (X i, y i ), where X i R N and y i is the associated label, y i {-1, 1}, the discriminant hyperplane is defined by: f ( X ) y α k ( X, = M i = 1 Linear and RBF kernels are used i i X i ) + b S.-B. Cho, Soft Computing Lab 28

29 Decision Tree Classifier A graph (tree) based model used primarily for classification Popular method for inductive inference A method for approximating discrete-valued target functions Easy to convert learned tree into if-then rules P2 P2 <= 0.03 P2 > 0.03 tumor P21 P21 <= 0.2 P21 > 0.2 P32 normal P32 <= 0.22 P32 > 0.22 normal tumor S.-B. Cho, Soft Computing Lab 29

30 Structure Adaptive SOM Classifier Dynamic node splitting classifier based on self organizing map (SOM) Overcome the shortcoming of SOM The structure of nodes does not have to be determined before training in advance P 1 P 1 C 0 C 1 P 0 P 4 P 2 P 0 P 2 C 2 C 3 P 3 P 3 S.-B. Cho, Soft Computing Lab 30

31 Classification Performance Comparisons Lymphoma cancer dataset SVM KNN MLP SASOM Linear RBF Cosine Pearson Avg. PC SC ED CC IG MI SN Avg S.-B. Cho, Soft Computing Lab 31

32 Classification Performance Comparisons Colon cancer dataset MLP SASOM SVM KNN Linear RBF Cosine Pearson DT Avg. PC SC ED CC IG MI SN Avg S.-B. Cho, Soft Computing Lab 32

33 Classification Comprehensive comparisons Ensemble approaches Soft Computing Lab

34 Overview Ensemble Classifier Limitation of machine learning classifiers in solving practical problems Incomplete dataset Noise in data Imperfection of classification algorithm Solution Searching for effective features of input patterns Utilizing multiple features Providing multiple pathways (more chance) to the optimal solution Improving classification performance Combining multiple classifiers Combining several prospective models may produce better prediction S.-B. Cho, Soft Computing Lab 34

35 Rationale Ensemble Classifier Feature space Selected feature Solution space Φ 1 F 1 Φ 2 F 2 High and complex space Φ F 3 3 Feature selection Classification Optimal solution Estimated solution by ensemble S.-B. Cho, Soft Computing Lab 35

36 Ensemble Approach Ensemble Classifier A good ensemble includes base classifiers that Are accurate easy Make their errors in different parts of the problem domain difficult Issues for ensemble classifiers How to generate good base classifiers From combinations of features and classifiers How to combine the base classifiers Majority voting Weighted voting Borda count BKS, S.-B. Cho, Soft Computing Lab 36

Ensemble Generation Ensemble Classifier Feature selection m Classification n Pearson correlation coefficients (PC) Spearman correlation coefficients (SC) Cosine coefficients (CC) Euclidean distance

37 Ensemble Generation Ensemble Classifier Feature selection m Classification n Pearson correlation coefficients (PC) Spearman correlation coefficients (SC) Cosine coefficients (CC) Euclidean distance (ED) Information gain (IG) Mutual information (MI) Signal to noise ratio (SN) Principal component analysis (PCA) Multilayer perceptron (MLP) K-nearest neighbor (KNN(C), KNN(P)) Support vector machine (SVM(L), SVM(R)) Structure adaptive self-organizing map (SASOM) Feature-classifier pair 1 Feature-classifier pair 2 Combination Huge number of available ensembles Feature-classifier pair mn mn 2 mn S.-B. Cho, Soft Computing Lab 37

38 Ensemble Strategies Ensemble Classifier Mutually exclusive features Negatively correlated features Combinatorial ensemble GA optimization Speciated GA optimization S.-B. Cho, Soft Computing Lab 38

39 Overview Mutually Exclusive Features Combining classifiers with mutually exclusive features through the analysis of correlation of features Input pattern Feature a mutually exclusive Feature b MLP KNN SVM linear SVM RBF MLP KNN SVM linear SVM RBF Combining module S.-B. Cho, Soft Computing Lab 39

40 Classification Rates Mutually Exclusive Features Leukemia dataset 100 Recognition rate [%] MLP KNN SVM RBF SVM linear KNN cosine SOM DT S.-B. Cho, Soft Computing Lab 40

41 Correlation of Features Mutually Exclusive Features Three representative cases of correlations Pearson s correlation between features has been calculated Euclidean distance Signal to noise ratio Cosine coefficient Pearson s correlation (a) Negative correlation (coefficient: -0.52) Pearson s correlation (b) Neutral (coefficient: -0.03) Pearson s correlation (c) Positive correlation (coefficient: 0.80) S.-B. Cho, Soft Computing Lab 41

42 Comparison of Accuracy Mutually Exclusive Features Recognition accuracy [%] Neural network Majority voting case(a) Negative correlation case (b) Neutral case (c) Positive correlation all feature S.-B. Cho, Soft Computing Lab 42

43 Overview Negatively Correlated Features Idea With two ideal gene vectors, select features whose expression patterns are similar to one of ideal gene vectors Train classifiers with two feature sets and combine them Method Sim(X, Y) : similarity between vector X and Y Ideal gene vector A Gene set whose expression pattern is similar to (1,1,1,,0,0,0) SGS I = argmax{sim(gene i, Ideal Gene Vector A)} Ideal gene vector B Gene set whose expression pattern is similar to (0,0,0,,1,1,1) SGS II = argmax{sim(gene i, Ideal Gene Vector B)} S.-B. Cho, Soft Computing Lab 43

44 Example Negatively Correlated Features Ideal Gene A (1,1,1,1,1,1,0,0,0,0,0,0) Ideal Gene B (0,0,0,0,0,0,1,1,1,1,1,1) Negative Gene 1 Correlation Gene 1' Gene 2 Gene 2' S.-B. Cho, Soft Computing Lab 44

Selected Features Negatively Correlated Features Leukemia dataset Pearson correlation coefficients ALL AML gene_3320 gene_4847 gene_2020 gene_1745 gene_5039 gene_1834 gene_461 gene_4196 gene_3847

45 Selected Features Negatively Correlated Features Leukemia dataset Pearson correlation coefficients ALL AML gene_3320 gene_4847 gene_2020 gene_1745 gene_5039 gene_1834 gene_461 gene_4196 gene_3847 gene_2288 gene_1249 gene_6201 gene_2242 gene_3258 gene_1882 gene_2111 gene_2121 gene_6200 gene_6373 gene_6539 gene_2043 gene_2759 gene_6803 gene_1674 gene_2402 gene_5772 gene_2301 gene_6055 gene_387 gene_4167 gene_4230 gene_6990 gene_4328 gene_6281 gene_5593 gene_2543 gene_1306 gene_6064 gene_2050 gene_3386 gene_2441 gene_4289 gene_4389 gene_1928 gene_515 gene_2354 gene_6471 gene_6515 gene_149 gene_3070 SGS II SGS I S.-B. Cho, Soft Computing Lab 45

46 PCA 3D Plot Negatively Correlated Features Select 25 genes from SGS I + 25 genes from SGS II by Pearson correlation coefficients and extract 3 principal components Well classifying AML and ALL Third PC Second PC First PC Red : ALL Blue : AML S.-B. Cho, Soft Computing Lab 46

47 Comparison of Performance Negatively Correlated Features accuracy(%) sensitivity(%) specificity(%) Leukemia MLP I MLP II MLP I + MLP II Colon MLP I MLP II MLP I + MLP II Lymphoma MLP I MLP II MLP I + MLP II S.-B. Cho, Soft Computing Lab 47

48 Overview Combinatorial Ensemble In theory, a good ensemble should include base classifiers that Are accurate Make their errors in different parts of the problem domain In practice Easy to obtain weak classifiers whose accuracy is about 50% Very difficult to get uncorrelated classifiers large number of classifiers do not guarantee the good performance of ensemble Testing ensembles combinatorially until the promising number of ensembles instead of all available ensembles S.-B. Cho, Soft Computing Lab 48

49 Structure Combinatorial Ensemble Gene Expression Data Methods F 1 F 2 F 3 F i Classifiers Selection C 1 C 2 C 3.Feature C j.feature-classifier Sets F 1 C 1 F 1 C 2 F 1 C 2 F i C j.n Combinatorial Selection ( n C 5 ) Ensemble Method prediction 1.Class c S.-B. Cho, Soft Computing Lab 49

50 Comparison of Accuracy Combinatorial Ensemble Combining method # of classifiers Leukemia Colon Lymphoma Majority voting All Weighted voting All Bayesian Combination All is less accurate, 7 is expensive S.-B. Cho, Soft Computing Lab 50

51 Overview GA Optimization There are so many available ensembles from several classifiers Exponentially increase with respect to the number of classifiers 48 base feature-classifier pairs make 2 48 ensembles Exhaustive searching is very time-consuming Use GA to find optimal ensemble in a short time Ensemble is made from 48 base feature-classifier pairs from 8 feature selection methods and 6 classifiers S.-B. Cho, Soft Computing Lab 51

52 Structure GA Optimization Normalized Gene Expression Profiles Feature Selector 1 Feature Selector 2... Feature Selector m feature-classifier pairs Classifier 1... Classifier Classifier n fitness evaluation x x o... GA searching x o x Ensemble Cancer Normal S.-B. Cho, Soft Computing Lab 52

53 GA Chromosome GA Optimization 0 CC-MLP 1 ED-MLP % 1 IG-MLP % 0 MI-MLP 0 PC-MLP 48 bits 1 PCA-MLP % 0 SN-MLP 0. 0 SC-MLP. SC-SVM(RBF) % ensemble result actual class Majority voting Genotype (chromosome) Phenotype (feature-classifier) Result of featureclassifier pair Fitness of a chromosome ch: Fit( ch) = # of correctly classified samples by ch # of total classified samples by ch S.-B. Cho, Soft Computing Lab 53

54 Change of Average Fitness GA Optimization Fitness Iteration Increase until the number of iterations reaches 150 Saturated after 150 iterations S.-B. Cho, Soft Computing Lab 54

55 Leave-one-out-cross Validation GA Optimization 100 validation(ensemble) validation(ensemble) test Accuracy(%) training average range test validation (single) training validation (single) Lymphoma Colon Optimal ensemble searched by GA outperforms!! S.-B. Cho, Soft Computing Lab 55

56 Comparison of Accuracy GA Optimization 100 accuracy best single ensemble of good classifiers best ensemble among 1 milion random ensemble best ensemble among 1 milion - simple GA, sharing best ensemble among 1 milion - crowding experiment GA > best single classifier > ensemble of good classifiers S.-B. Cho, Soft Computing Lab 56

57 Some Optimal Ensembles GA Optimization Majority voting Weighted voting Feature-classifier pair Accuracy (%) Feature-classifier pair Accuracy (%) CC-KNN(P) 75.0 MI-KNN(C) 83.3 SN-KNN(C) 79.2 SC-SASOM 62.5 IG-SVM(L) 91.7 Ensemble 100 IG-KNN(C) 91.7 MI-KNN(C) 83.3 SN-KNN(C) 79.2 SN-KNN(P) 79.2 CC-SASOM 54.2 IG-SASOM 83.3 PC-SVM(R) 62.5 Ensemble 100 S.-B. Cho, Soft Computing Lab 57

58 Overview Speciated GA Optimization Among all the 2 mn ensembles Standard GA does not guarantee optimal solution GA usually converges to local optima There may be many optimal ensembles The number is unknown GA just finds one of them Use of speciated GA instead of standard GA Fitness sharing Deterministic crowding S.-B. Cho, Soft Computing Lab 58

59 Concept Speciated GA Optimization Solution space genetic drift Ω Observation space Solutions searched by simple GA Solutions searched by speciated GA S.-B. Cho, Soft Computing Lab 59

Structure Speciated GA Optimization Microarray data Preprocessing Gene expression data matrix Feature selection PC SC ED CC IG MI SN PCA... Classifier MLP KNN(C) KNN(P) SVM(L) SVM(R) SASOM.

60 Structure Speciated GA Optimization Microarray data Preprocessing Gene expression data matrix Feature selection PC SC ED CC IG MI SN PCA... Classifier MLP KNN(C) KNN(P) SVM(L) SVM(R) SASOM... Training FCs FC1 FC2 FC2... FC48 Ensemble Ensemble maker Searching speciated GA searching Validation Optimal ensemble Evaluation new instance Test Tumor Normal S.-B. Cho, Soft Computing Lab 60

61 Fitness Function Speciated GA Optimization Fitness of a chromosome ch Fitness( ch) = Acc( ch) α * Num1( ch) where Acc( ch) = # of correctly classified samples by ch # of total classified samples by ch The shorter, the better Num 1 ( ch) = # of bit 1's in chromosome ch α :constant S.-B. Cho, Soft Computing Lab 61

62 Deterministic Crowding Speciated GA Optimization Input: g - number of generations to run, s - population size Output: P(g) - the final population P(0) initialize() for t 1 to g do P(t) shuffle(p(t-1)) for i 0 to s/2-1 do a 2i+1 (t) Od od p 1 p 2 a 2i+2 (t) {c1, c2} recombination(p1, p2) c 1 ' mutate(c 1 ) c 2 ' mutate(c 2 ) if[d(p 1,c 1 ')+d(p 2,c 2 ')] [d(p 1,c 2 ')+d(p 2,c 1 ')] then if F(c 1 ') > F(p 1 ) then a 2i+1 (t) c 1 ' fi if F(c 2 ') > F(p 2 ) then a 2i+2 (t) c 2 ' fi else if F(c 2 ') > F(p 1 ) then a 2i+1 (t) c 2 ' fi if F(c 1 ') > F(p 2 ) then a 2i+1 (t) c 1 ' fi fi S.-B. Cho, Soft Computing Lab 62

63 Fitness Sharing Speciated GA Optimization A strategy that maintains diversity of chromosomes through lowering the fitnesses of individuals that are located close Use shared fitness F (i) instead of original fitness F(i) F( i) F '( i) = m( i) µ m ( i) = sh( d( i, j)) sh(d ) = j 1 sharing α 1 ( d / σ share ) if d < σ share 0 otherwise shared fitness fitness S.-B. Cho, Soft Computing Lab 63

64 Comparison of Diversity Speciated GA Optimization The number of optimal ensembles found by each method on one dataset Experiment sga sharing crowding crowding >> sga sharing S.-B. Cho, Soft Computing Lab 64

65 Speciated GA Optimization Change of Fitness and Accuracy fitness, accuracy simple GA, fitness simple GA, accuracy sharing, fitness sharing, accuracy crowding, fitness crowding, accuracy iteration crowding >> sga sharing S.-B. Cho, Soft Computing Lab 65

66 Search Efficiency Speciated GA Optimization Iterations Common GA Sharing Crowding Execution time per iteration: simple GA < crowding < sharing S.-B. Cho, Soft Computing Lab 66

67 Conclusion Classification Comparisons of feature/classifiers Exploration of ensemble approaches S.-B. Cho, Soft Computing Lab 67

DNA Gene Expression Classification with Ensemble Classifiers Optimized by Speciated Genetic Algorithm

DNA Gene Expression Classification with Ensemble Classifiers Optimized by Speciated Genetic Algorithm Kyung-Joong Kim and Sung-Bae Cho Department of Computer Science, Yonsei University, 134 Shinchon-dong,