Metalearning for gene expression data classification
|
|
- Norman Jennings
- 6 years ago
- Views:
Transcription
1 Metalearning for gene expression data classification Bruno F. de Souza and André de Carvalho ICMC-University of Sao Paulo, Sao Carlos, Brazil Carlos Soares LIAAD-INESC Porto LA/Fac. de Economia Universidade do Porto, Portugal Abstract Machine Learning techniques have been largely applied to the problem of class prediction in microarray data. Nevertheless, current approaches to select appropriate methods for such task often result unsatisfactory in many ways, instigating the need for the development of tools to automate the process. In this context, the authors introduce the use of metalearning in the specific domain of gene expression classification. Experiments with the KNN-ranking method for algorithm recommendation applied for 49 datasets yielded successful results. 1. Introduction With the completion of a number of genomic sequencing programs, a wealth of biological data has become available, allowing an unprecedent set of opportunities to better understand the processes that conduct living systems [7]. An area that could greatly benefit from post-genomic research is health care, through the identification of genetic artifacts that could be somehow related to pathological states. In fact, the combat against cancer has already achieved promising advances, mainly due to the introduction of new wide scale gene expression analysis technologies, such as microarrays [21]. Microarrays are hybridization-based methods that allow the monitoring of the expression levels of thousands of genes simultaneously [23]. This enables the measurement of the levels of mrna molecules inside a cell and, consequently, the proteins being produced. Therefore, the role of the genes of a cell in a given moment, and under some circumstances, can be better understood by assessing their expression levels. In order to acquire qualitatively interesting information from the microarray experiments, one usually employs The authors acknowledge the support from FAPESP, CNPq and FCT (project Triana - POCT/TRA/61001/ and Programa de Financiamento Plurianual de Unidades de I&D). computational tools [28]. Specifically, Machine Learning (ML) algorithms [17] have been largely considered, mainly due to their ability to automatically extract patterns from data. One of the most promising ML tasks in this context is supervised classification [26]. Basically, it is used to identify the class membership of a sample based on its gene expression profile (e.g. normal versus cancerous tissues). One of the earliest applications of ML for classification of microarray data was the successful design of a classifier system to distinguish patients of related types of leukemia [10]. Following this approach, various studies confirmed the applicability of ML in gene expression domains [9, 20, 16, 15, 25, 12]. An analysis of their results indicates no single algorithm performs better than the others in all cases. This situation is related to the no-free-lunch theorem [22] and it emphasizes the need for a careful selection of the algorithm to be used on each specific problem. The current rules of the thumb for algorithm selection rely on costly trial-and-error procedures or expert advice [5]. Both approaches may not be satisfactory to the end user, typically a biologist or clinician, who intends to analyze microarray data more direct and cost-effectively. So, in practice, the choice of the ML algorithms is basically determined by the familiarity of the user with the algorithm rather than the particularities of the data and the algorithms themselves. This may lead to sub-optimum results, compromising the whole experimental setup. Therefore, a system that is able to automatically predict the performance of algorithms on new problems is highly desirable. One approach for that is metalearning [5]. The term generally refers to techniques that exploit expertise acquired in the process of applying ML algorithms in order to increase the quality of results obtained in future applications [5]. This work focuses on a particular perspective of metalearning, which is concerned with inducing metamodels that relate the characteristics of the problems with the performance of ML algorithms. The metamodels are then used to support algorithm selection for new problems. Metalearning has been successfully applied for algorithm recommendation on sets of diverse classification
2 problems [5]. However, it has never been tested on problems of a single, specific domain. Therefore, the goal of this work is to test whether it is possible to succesfully apply metalearning to problems of a single domain. The domain chosen for this study is classification of gene expression. Not only is this an important ML application, as argued earlier, but it also has has a number of idiosyncrasies, such as the morphology of the data. The metalearning algorithm used here is the KNN ranking method [5]. This method is particularly useful for algorithm recommendation because it generates a ranking of the algorithms for a given dataset, based on the expected performance of those algorithms. This document is organized is follows. Section 2 presents an overview of the application of ML algorithms to the classification of gene expression data. It also provides quantitative arguments to justify the use of metalearning in this domain. In Section 3, the general architecture of the meta-learning system employed is explained. Section 4 discusses the experimental results obtained. Finally, Section 5 draws the conclusions of this work and points out future research directions. 2. Gene expression data classification Traditional methods for cancer classification rely essentially on the tumor s morphological appearance [10] and on the tissue of its origin [24]. However, there is no assurance that similar tumors will have the same clinical development and, therefore, will demand a similar course of action. To permit a deeper understanding of the tissues being analyzed and to try to achieve a finer distinction of cancers, Golub et al [10] applied microarrays to define a genetic portrait of tissues from two types of acute leukemia, AML and ALL. With a simple weighted voting scheme, the authors were able to correctly classify most of the samples. That seminal work introduced the ML community to gene expression data classification. In this context, tissue samples x i are multidimensional observations represented by m genes x i,j and have an associated class y i (e.g. the presence or absence of a disease). The task of a ML technique is to learn a discriminant function f( x i ), induced from a training set S = {( x 1, y 1 ),..., ( x n, y n )}, such that it is able to relate x i with the corresponding y i and to exploit this relation to classify previously unseen tissue samples. An interesting point to note is the usual disproportional rate of between the very large number of genes and the small number of tissue samples. Within this framework, many studies in the literature have proven its efficacy. Two of those studies are discussed next. Nutt et al have studied the feasibility of using gene expression data to classify high-grade gliomas [19]. They proposed an approach based on a k-nearest Neighbors (KNN) classifier that was able to discriminate high-grade, nonclassic glial tumors objectively and reproducibly, outperforming the naive histopathological-based classification. The dataset used consists of 50 expression profiles obtained from Affymetrix high-density oligonocleotide microarrays containing probes for about genes. In an effort to improve the understanding of the molecular basis of Papillary Renal Cell Carcinoma (PRCC), Yang et al [30] have studied the gene expression profiles of 34 cases of PRCC from an Affymetrix array with probe sets. Using unsupervised analyses, they were able to identify two highly correlated distinct molecular subclasses with morphological correlation. Through the application of a Prediction Analysis of Microarrays (PAM) classifier over the samples of the two subclasses, the authors were able to achieve a very good cross validation accuracy when considering a subset of genetic markers. A broader view on the matter is provided by recent articles. Larrañaga et al [14] presents an extensive review of the application of machine learning methods in bioinformatics, with a section devoted to class prediction. They discuss issues like how to assess and compare performance of ML algorithms, the problem of feature selection and the most representative classification paradigms, with examples of their application. Asyali et al [1] focused exclusively in gene expression data and provided a more critical survey of ML methods in the context, along with the implications of the findings. In their work, key points of microarray analysis, e.g. preprocessing and classifier design, were covered and examined. This is important since, according to the authors, there has been a dramatic increase of studies related to gene expression profile classification over the last years. Such an interest raises the issue of which type of classifier should be applied and, as argued next, should be carefully addressed. In the current practice, the most used ML algorithms for gene expression classification are [1]: KNN, SVMs, Decision Trees (such as CART), PAM, Neural Networks, FLDA, DLDA and DQDA. Comparative studies, that inspect the performance of different algorithms over a range of problems, provide some support to users who need to decide which one to use. Dudoit et al [9] compared the performance of LDA, DLDA, DQDA, Weighted Vote Scheme, KNN, trees and tree-based ensembles on three microarray datasets. Their main conclusion is that simple methods such as DLDA and KNN perform very well in comparison to more sophisticated methods such as tree-based ensembles. Romualdi et al [20] studied the performance of DLDA, trees, Neural Networks, SVMs, KNN and PAM on two datasets. They were unable to obtain evidence to support that one of those methods performs better than the others. Man et al [16] included in their comparison KNN, PCA+LDA, PLS-DA, Neural Networks, random forests and SVM. Based on experiments with 6 datasets, they concluded that PLS-DA and
3 SVM presented the best results. In a very comprehensive study, using 21 classification methods (including most of the previous approaches) applied to 7 datasets, Lee et al [15] claimed that no classifier is systematically better than the other. Statnikov et al [25] compared 3 multi-class classification methods, named multi-class SVMs, KNN, Neural Networks, on 11 datasets and concluded that SVMs outperformed their competitors. Finally, Huang et al [12] compared the performance of 5 statistical methods (PLS, penalized PLS, LASSO, PAM and random forests) on 2 datasets and concluded that the algorithms obtain similar results. As a whole, the aforementioned studies suggest that there is no obvious winning algorithm. Although some methods do present a tendency to perform well, such as SVMs, none of them is the best on all datasets. In the present work, the authors further investigated this hypothesis. One possibility to achive this is to analyze the relative performance of some ML algorithms on various microarray datasets. For each dataset, one can construct a ranking of algorithms based on estimates of their performance. Here it is assumed that better algorithms are ranked higher (i.e., they are assigned ranks closer to 1). The distribution of the ranks of an algorithm over the datasets gives an indication of how well it performs in comparison to the others. Figure 1 presents the distribution of ranks for the seven algorithms and 49 datasets used in this work (Section 4). Each bar indicates how many times a given algorithm was ranked in each of the 7 possible positions, 1 represented by different levels of gray. The figure confirms the previous observation that there is no clear winner, although a few algorithms tend to perform well. Figure 1. Distribution of rankings. 1 One unit intervals are represented, because rank mean is assigned to rank for ted algorithms. 3 Metalearning As shown in the previous section, the data analyst must carefully select which algorithm to use on each problem, in order to obtain satisfactory results. Running an algorithm on a dataset is time consuming, especially when complex tasks with a large volume of data are involved, as is often the case in bioinformatics. Therefore, selecting the algorithm by trying out all alternatives is generally not a viable option. An alternative approach consists of using a learning algorithm to model the relation between the characteristics of learning problems (e.g., number of examples) and the relative performance of a set of algorithms [5]. Here, the authors refer to this approach as metalearning because one is learning about the performance of learning algorithms. Meta-learning models can be used to predict the relative performance of the set of algorithms on a new dataset based on the characteristics of the dataset and without actually running any of the algorithms. This approach involves three steps: (1) the generation of metadata; (2) induction of a meta-learning model by applying a learning algorithm on the metadata; and (3) application of the metamodel to support the selection of the algorithms to be used in new datasets. Next, the authors summarize these steps but for a more thorough description, the reader is referred to [5] and references therein. Metadata In this context, metadata are data that describe the (relative) performance of the selected algorithms on a set of datasets, which were already processed with those algorithms. They consist of a set of meta-examples, each one representing one dataset. Each meta-example consists of attributes and a target. Datasets for metalearning are usually obtained from repositories. The attributes, which also known as metafeatures, are measures that characterize the datasets. These measures represent general properties of the data which are expected to affect the performance of the algorithms. A few examples of commonly used metafeatures are the number of examples, the proportion of symbolic attributes, class entropy and the mean correlation between attributes. These are examples of what are usually referred to as general, statistical and information-theoretic metafeatures [5]. The target represents the relative performance of the algorithms on the dataset. Many metalearning approaches to the problem of algorithm recommendation handle it as a supervised classification task. The recommendation provided to the user consists of a single algorithm and the target variable is, thus, a nominal attribute containing the algorithm that achieved the best performance on the corresponding dataset. However, this is not the most adequate form of recommendation for this problem. It does not provide any further guidance when the user is not satisfied with the re-
4 sults obtained with the recommended algorithm. Although, as stated earlier, executing all the algorithms is not a viable strategy, it is often the case that the available computational resources are sufficient to run more than one of the available algorithms. If recommendation indicates the order in which the algorithms should be executed, then the user can execute as many as possible, thus increasing the probability that a satisfactory result is obtained. Therefore, the problem of algorithm recommendation should be tackled as a ranking task [5], which is discussed in the following section. KNN-ranking Method The metadata, as presented in the previous section, consists of a set of meta-examples that are described with a set of metafeatures and with a target consisting of a ranking of the ML algorithms, which is referred to as the target ranking. This learning problem is similar to the problem of supervised classification. The difference is that, given a new example described by the values of the attributes, the objective in classification is to predict the class it belongs to while the objective in ranking is to predict the order of the classes as applicable to that example. An algorithm that has previously been adapted for learning rankings and applied to the meta-learning problem with successful results is the k-nearest Neighbors (KNN) algorithm [5]. The difference between ranking and classification is only on the target. Therefore, any common distance function (e.g. the Euclidean distance considered here) can be used by KNN to measure the similarity between examples. After selecting k neighbors, the corresponding target rankings must be aggregated to generate a prediction. In classification, this is achieved by predicting the most frequent class among the selected examples. A simple approach is to aggregate the k target rankings with the Average Ranks (AR) method [5]. Let R i,j be the rank of base-algorithm a j (j = 1,..., n) on dataset i, where n is the number of algorithms. The average rank for each a j is: R j = k i=1 R i,j k The final ranking is obtained by ordering the average ranks and assigning ranks to the algorithms accordingly. Evaluation and Application The metamodel can then be used to support the data analyst in selecting the algorithm to use on a new dataset. To do this, it is first necessary to compute the metafeatures for the new dataset and the ranking of the algorithms can be predicted using the KNN method. However, to convince data analysts to apply a metalearning approach in practice, it is necessary to produce evidence that it is able to to generate accurate predictions. One approach is to use Leave-one-out Cross Validation (LOOCV), which consists of iteratively, for each metaexample, computing the accuracy of the predicted ranking using a metamodel obtained on all the remaining metaexamples [5]. To measure ranking accuracy, the authors have used Spearman s Rank Correlation Coefficient, r S, which is given by the expression: r S = 1 6 n i=1 (R(X i) R(Y i )) 2 n 3 n where X and Y are two sets of n values and R(X i ) represents the rank of element i in the series X. The coefficient simply evaluates the monotonicity of two sets of values, i.e., if their variations are related. The value of 1 represents perfect agreement, and -1 perfect disagreement (i.e., the rankings are inverted. A correlation of 0 means that the rankings are not related, which would be the expected score of the random ranking method [5]. To determine whether the accuracy of some particular recommended ranking can be regarded as high or not, a baseline method is required. In machine learning, simple prediction strategies are usually employed to set a baseline for more complex methods. For instance, a baseline commonly used in classification is the most frequent class in the dataset, referred to as the default class. The baseline is typically obtained by summarizing the values of the target variable for all the examples in the dataset. In ranking, a similar approach consists of applying the Average Ranks (AR) method to all the target rankings in the metadata. The ranking obtained is called the default ranking. 4. Experimental results Datasets The meta-data employed in this work came from 49 publicly available microarray datasets. They are related to disease diagnostic. Mainly, the task is either discriminating between normal and tumor cases or between different types of tumor. They present very diverse characteristics concerning the number of examples, the number of genes and the number of classes. Due to space constrains, the the datasets are not described here, but full descriptions can be retrieved from br\ bferes. Two preprocessing operations were performed. As some datasets presented missing values, imputation was done using the Least Square Adaptation method [2], following recommendation from Brock et al [6]. Additionally, all attributes are normalized to have mean 0 and variance 1. This is first done for the training data and then the test data are rescaled accordingly. ML algorithms Based on the comparative studies of ML algorithms for gene expression classification presented in
5 Section 2, seven classifiers were selected mainly according to two criteria: performance and training time: they are relatively fast to train and present error rates adequate on at least some datasets. The methods are: Diagonal Linear Discriminat Analysis (DLDA) [9], Diagonal Quadratic Discriminat Analysis (DQDA) [9], Prediction Analysis of Microarray (PAM) [27], the 3-Nearest Neighbor (3-NN) [8], Support Vector Machines (SVM) [29] with linear and RBF kernels and Penalized Discriminant Analysis (PDA) [11]. An important issue within the metalearning framework is the estimation of the performance of the classifiers. In the context of gene expression data, this is subject of ongoing research, with no widely accepted methodology [4, 13]. In the present work, the.632+ estimator is used, following the suggestion of Braga-Neto and Dougherty [4]. Metafeatures Although the datasets used are different from traditional classification datasets, ten metafeatures originally developed for that kind of datasets in the StatLog project were used [5]. As some of those may not be suitably applicable due to the high dimensionality of data, their calculation were preceded by a data reduction step. Here, Partial Least Square (PLS) was employed, mainly due to its good results in a number of microarray studies (see [3] and references therein) and to its low computational cost. The number of PLS components considered was 3, which seems to be adequate for preserving discrimination power in the context of expression data [18]. The measures are given next. More information can be found in [5]. 1. Log of number of examples 2. Log of number of features 3. Log of number of classes 4. Mean absolute skewness 5. Mean kurtosis 6. Geometric mean ratio of the standard deviations of individual populations to the pooled standard deviations 7. First canonical correlation 8. Proportion of total variation explained by the firs canonical correlation 9. Normalized class entropy 10. Average absolute correlation between continuous attributes, per class Results The experiments conducted in this work followed the evaluation and application guidelines presented in Section 3. The main results obtained here are illustrated in Figure 2. The points represent the mean ranking accuracy over the 49 datasets accordingly to the LOOCV approach, varying the number k of nearest neighbors from 1 to 20. The smallest values of k present the lowest performance (69.8% mean accuracy). Then, it increases with increasing value of k until it reaches a point of saturation, where the behavior of the accuracy remains basically constant and then gracefully drops. Here, k = 4 and k = 5 give similar results (both roughly 78.3% mean accuracy). In any case, the KNN ranking method clearly outperforms the default ranking (dashed line) (59.9% mean accuracy), generating rankings more correlated in average to the ideal rankings. This indicates that metalearning can be successully applied to recommend algorithms for gene expression analysis. Figure 2. Ranking accuracy of KNN. Additionally, these results are somewhat different to the ones reported on algorithm recommendation experiments with general classification datasets [5]. In this work, the best results were achieved with a very small k (1 or 2) and the KNN ranking method quickly became worse than the default ranking. This may be explained by the fact that, being from the same application domain, the datasets used here are more homogeneous. Therefore, the KNN algorithm has a smoother behavior with varying k. 5. Conclusions In this paper, it is presented an empirical analysis of the performance of a metalearning method on the problem of recommending learning algoritms for gene expression classification. Metalearning has been successfully applied to general classification problems. However, it was never applied to a restricted domain, such as gene expression data. The results presented here show that it is possible to use metalearning to recommend classifiers for gene expression data. It was observed that the behavior of the metalearning algorithm is actually smoother than when applied to general classification problems. It was employed an approach based on the KNN algorithm, mainly because it was previously applied to other metalearning problems with successful results [5]. In the future, it is necessary to investigate if better results can be obtained with different methods.
6 Here, a set of general metafeatures was used to characterize datasets. However, the gene expression classification datasets have significant differences to most other classification problems, namely in terms of their morphology. Therefore, it is expected that better meta-learning models can be obtained using metafeatures that are specifically designed for this application domain. Additionally, it has been shown that, although ranking accuracy is an important criterion in the evaluation of metalearning systems during their development, the data analyst is interested in the quality of the results obtained by the selected classifiers [5]. In the case of gene expression analysis, the data analysts are interested not only in the accuracy of the models but also on their interpretability. The authors plan to address these issues in our future work. References [1] M. H. Asyali, D. Colak, O. Demirkaya, and M. S. Inan. Gene expression profile classification: A review. Current Bioinformatics, 1(1):55 73, [2] T. H. B, B. Dysvik, and I. Jonassen. Lsimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Research, 32(3):e34, [3] A.-L. Boulesteix and K. Strimmer. Partial least squares: A versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics, 8(1):32 44, [4] U. M. Braga-Neto and E. R. Dougherty. Is cross-validation valid for small-sample microarray classification? Bioinformatics, 20(3): , [5] P. Brazdil, C. Soares, and J. da Costa. Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning, 50(3): , [6] G. N. Brock, J. R. Shaffer, R. E. Blakesley, M. J. Lotz, and G. C. Tseng. Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes. BMC Bioinformatics, 9(12), [7] F. S. Collins, E. D. Green, A. E. Guttmacher, and M. S. Guyer. A vision for the future of genomics research. Nature, 422: , April [8] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience, 2ł edition, [9] S. Dudoit, J. Fridlyand, and T. P. Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(457):77 87, [10] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286: , October [11] T. Hastie, A. Buja, and R. Tibshirani. Penalized discriminant analysis. Ann. Statist., 23:73102, [12] X. Huang, W. Pan, S. Grindle, X. Han, Y. Chen, S. J. Park, L. W. Miller, and J. Hall. A comparative study of discriminating human heart failure etiology using gene expression profiles. BMC Bioinformatics, 6:205, [13] W. Jiang and R. Simon. A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification. Statistics in Medicine, 26: , [14] P. Larraaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armaanzas, G. Santaf, A. Prez, and V. Robles. Machine learning in bioinformatics. Briefings in Bioinformatics, 7(1):86 112, [15] J. W. Lee, J. B. Lee, M. Park, and S. H. Song. An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis, 48(4): , [16] M. Z. Man, G. Dyson, K. Johnson, and B. Liao. Evaluating methods for classifying expression data. J Biopharm Stat., 14(4): , [17] R. S. Michalski, J. Carbonell, and T. M. Mitchell. Machine Learning: an Artificial Intelligence Approach. Morgan Kaufmann Publishers, Inc., [18] D. V. Nguyen and D. M. Rocke. Multi-class cancer clasification via partial least squares with gene expression profiles. Bioinformatics, 18(9): , [19] C. L. Nutt, D. R. Mani, R. A. Betensky, and P. Tamayo. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res., 63(7):1602 7, [20] C. Romualdi, S. Campanaro, D. Campagna, B. Celegato, N. Cannata, S. Toppo, G. Valle, and G. Lanfranchi. Pattern recognition in gene expression profiling using dna array: a comparative study of different statistical methods applied to cancer classification. Human Molecular Genetics, 12(8): , [21] G. Russo, C. Zegar, and A. Giordano. Advantages and limitations of microarray technology in human cancer. Oncogene, 22(42): , September [22] C. Schaffer. A conservation law for generalization performance. In ICML, pages , [23] M. Schena. DNA Microarrays: A Practical Approach. Practical Approach Series. Oxford University Press, Oxford, Inglaterra, 1ł edition, [24] D. K. Slonim, P. Tamayo, J. P. Mesirov, T. R. Golub, and E. S. Lander. Class prediction and discovery using gene expression data. In RECOMB, pages , [25] A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 21(5): , [26] A. Tarca, R. Romero, and S. Draghici. Analysis of microarray experiments of gene expression profiling. Am J Obstet Gynecol., 195(2):373 88, Agosto [27] R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. Diagnosis of multiple cancer types by shrunken centroids of gene expression. PNAS, 99(10): , May [28] B. Tjaden and J. Cohen. A survey of computational methods used in microarray data interpretation. Applied Mycology and Biotechnology, Volume 6: Bioinformatics:1 18, [29] V. N. Vapnik. The nature of statistical learning theory. Springer-Verlag New York, Inc., [30] X. J. Yang, M.-H. Tan, H. L. Kim, et al. A molecular classification of papillary renal cell carcinoma. Cancer Research, 65(13): , July 2005.
Data Mining in Bioinformatics. Prof. André de Carvalho ICMC-Universidade de São Paulo
Data Mining in Bioinformatics Prof. André de Carvalho ICMC-Universidade de São Paulo Main topics Motivation Data Mining Prediction Bioinformatics Molecular Biology Using DM in Molecular Biology Case studies
More informationFeature selection methods for SVM classification of microarray data
Feature selection methods for SVM classification of microarray data Mike Love December 11, 2009 SVMs for microarray classification tasks Linear support vector machines have been used in microarray experiments
More informationMethods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development
1 Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development Alexander Statnikov M.S., Constantin F. Aliferis M.D.,
More informationRandom forest for gene selection and microarray data classification
www.bioinformation.net Hypothesis Volume 7(3) Random forest for gene selection and microarray data classification Kohbalan Moorthy & Mohd Saberi Mohamad* Artificial Intelligence & Bioinformatics Research
More informationStudy on the Application of Data Mining in Bioinformatics. Mingyang Yuan
International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2016) Study on the Application of Mining in Bioinformatics Mingyang Yuan School of Science and Liberal Arts, New
More informationData Mining for Biological Data Analysis
Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han
More informationSupervised Learning from Micro-Array Data: Datamining with Care
November 18, 2002 Stanford Statistics 1 Supervised Learning from Micro-Array Data: Datamining with Care Trevor Hastie Stanford University November 18, 2002 joint work with Robert Tibshirani, Balasubramanian
More informationFinding molecular signatures from gene expression data: review and a new proposal
Finding molecular signatures from gene expression data: review and a new proposal Ramón Díaz-Uriarte rdiaz@cnio.es http://bioinfo.cnio.es/ rdiaz Unidad de Bioinformática Centro Nacional de Investigaciones
More informationData mining: Identify the hidden anomalous through modified data characteristics checking algorithm and disease modeling By Genomics
Data mining: Identify the hidden anomalous through modified data characteristics checking algorithm and disease modeling By Genomics PavanKumar kolla* kolla.haripriyanka+ *School of Computing Sciences,
More informationComparing Correlation Coefficients as Dissimilarity Measures for Cancer Classification in Gene Expression Data
Comparing Correlation Coefficients as Dissimilarity Measures for Cancer Classification in Gene Expression Data Pablo A. Jaskowiak and Ricardo J. G. B. Campello Department of Computer Sciences University
More information2 Maria Carolina Monard and Gustavo E. A. P. A. Batista
Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of
More informationA Hybrid Approach for Gene Selection and Classification using Support Vector Machine
The International Arab Journal of Information Technology, Vol. 1, No. 6A, 015 695 A Hybrid Approach for Gene Selection and Classification using Support Vector Machine Jaison Bennet 1, Chilambuchelvan Ganaprakasam
More informationToday. Last time. Lecture 5: Discrimination (cont) Jane Fridlyand. Oct 13, 2005
Biological question Experimental design Microarray experiment Failed Lecture : Discrimination (cont) Quality Measurement Image analysis Preprocessing Jane Fridlyand Pass Normalization Sample/Condition
More informationBIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis
BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology Lecture 2: Microarray analysis Genome wide measurement of gene transcription using DNA microarray Bruce Alberts, et al., Molecular Biology
More informationGene Reduction for Cancer Classification using Cascaded Neural Network with Gene Masking
Gene Reduction for Cancer Classification using Cascaded Neural Network with Gene Masking Raneel Kumar, Krishnil Chand, Sunil Pranit Lal School of Computing, Information, and Mathematical Sciences University
More informationA Genetic Algorithm Approach to DNA Microarrays Analysis of Pancreatic Cancer
A Genetic Algorithm Approach to DNA Microarrays Analysis of Pancreatic Cancer Nicolae Teodor MELITA 1, Stefan HOLBAN 2 1 Politehnica University of Timisoara, Faculty of Automation and Computers, Bd. V.
More informationSurvival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification
Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Final Project Report Alexander Herrmann Advised by Dr. Andrew Gentles December
More informationPREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING
PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING Abbas Heiat, College of Business, Montana State University, Billings, MT 59102, aheiat@msubillings.edu ABSTRACT The purpose of this study is to investigate
More informationadvanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA
advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents
More informationSELECTING GENES WITH DISSIMILAR DISCRIMINATION STRENGTH FOR SAMPLE CLASS PREDICTION
July 3, 26 6:34 WSPC/Trim Size: in x 8.5in for Proceedings eegs apbc27 SELECTING GENES WITH DISSIMILAR DISCRIMINATION STRENGTH FOR SAMPLE CLASS PREDICTION Zhipeng Cai, Randy Goebel, Mohammad R. Salavatipour,
More informationClassification Study on DNA Microarray with Feedforward Neural Network Trained by Singular Value Decomposition
Classification Study on DNA Microarray with Feedforward Neural Network Trained by Singular Value Decomposition Hieu Trung Huynh 1, Jung-Ja Kim 2 and Yonggwan Won 1 1 Department of Computer Engineering,
More informationBIOINFORMATICS THE MACHINE LEARNING APPROACH
88 Proceedings of the 4 th International Conference on Informatics and Information Technology BIOINFORMATICS THE MACHINE LEARNING APPROACH A. Madevska-Bogdanova Inst, Informatics, Fac. Natural Sc. and
More informationA Genetic Approach for Gene Selection on Microarray Expression Data
A Genetic Approach for Gene Selection on Microarray Expression Data Yong-Hyuk Kim 1, Su-Yeon Lee 2, and Byung-Ro Moon 1 1 School of Computer Science & Engineering, Seoul National University Shillim-dong,
More informationOur view on cdna chip analysis from engineering informatics standpoint
Our view on cdna chip analysis from engineering informatics standpoint Chonghun Han, Sungwoo Kwon Intelligent Process System Lab Department of Chemical Engineering Pohang University of Science and Technology
More informationHybrid Intelligent Systems for DNA Microarray Data Analysis
Hybrid Intelligent Systems for DNA Microarray Data Analysis November 27, 2007 Sung-Bae Cho Computer Science Department, Yonsei University Soft Computing Lab What do I think with Bioinformatics? Biological
More informationIdentification of biological themes in microarray data from a mouse heart development time series using GeneSifter
Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter VizX Labs, LLC Seattle, WA 98119 Abstract Oligonucleotide microarrays were used to study
More informationAPPLICATION OF COMMITTEE k-nn CLASSIFIERS FOR GENE EXPRESSION PROFILE CLASSIFICATION. A Thesis. Presented to
APPLICATION OF COMMITTEE k-nn CLASSIFIERS FOR GENE EXPRESSION PROFILE CLASSIFICATION A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 16 Reader s reaction to Dimension Reduction for Classification with Gene Expression Microarray Data by Dai et al
More informationApplication of Decision Trees in Mining High-Value Credit Card Customers
Application of Decision Trees in Mining High-Value Credit Card Customers Jian Wang Bo Yuan Wenhuang Liu Graduate School at Shenzhen, Tsinghua University, Shenzhen 8, P.R. China E-mail: gregret24@gmail.com,
More informationTop-down Forecasting Using a CRM Database Gino Rooney Tom Bauer
Top-down Forecasting Using a CRM Database Gino Rooney Tom Bauer Abstract More often than not sales forecasting in modern companies is poorly implemented despite the wealth of data that is readily available
More informationBagged Ensembles of Support Vector Machines for Gene Expression Data Analysis
Bagged Ensembles of Support Vector Machines for Gene Expression Data Analysis Giorgio Valentini INFM, Istituto Nazionale di Fisica della Materia, DSI, Dip. di Scienze dell Informazione Università degli
More informationClassifying Gene Expression Data using an Evolutionary Algorithm
Classifying Gene Expression Data using an Evolutionary Algorithm Thanyaluk Jirapech-umpai E H U N I V E R S I T Y T O H F R G E D I N B U Master of Science School of Informatics University of Edinburgh
More informationLearning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data
Predictive Genomics, Biology, Medicine Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data Ex. Find mean m and standard deviation s for
More informationA STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET
A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,
More informationRevealing Predictive Gene Clusters with Supervised Algorithms
DSC 23 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc-23/ Revealing Predictive Gene Clusters with Supervised Algorithms Marcel Dettling Seminar für Statistik ETH Zürich CH-892
More informationBioinformatics : Gene Expression Data Analysis
05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used
More informationMethods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development
Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development Alexander Statnikov, Constantin F. Aliferis, Ioannis Tsamardinos
More informationDNA Gene Expression Classification with Ensemble Classifiers Optimized by Speciated Genetic Algorithm
DNA Gene Expression Classification with Ensemble Classifiers Optimized by Speciated Genetic Algorithm Kyung-Joong Kim and Sung-Bae Cho Department of Computer Science, Yonsei University, 134 Shinchon-dong,
More informationStatistical Machine Learning Methods for Bioinformatics VI. Support Vector Machine Applications in Bioinformatics
Statistical Machine Learning Methods for Bioinformatics VI. Support Vector Machine Applications in Bioinformatics Jianlin Cheng, PhD Computer Science Department and Informatics Institute University of
More informationAn Empirical Study of Univariate and GA-Based Feature Selection in Binary Classification with Microarray Data
An Empirical Study of Univariate and GA-Based Feature Selection in Binary Classification with Microarray Data Mike Lecocke and Kenneth Hess 2nd March 2005 Abstract Motivation: Feature subset selection
More informationROAD TO STATISTICAL BIOINFORMATICS CHALLENGE 1: MULTIPLE-COMPARISONS ISSUE
CHAPTER1 ROAD TO STATISTICAL BIOINFORMATICS Jae K. Lee Department of Public Health Science, University of Virginia, Charlottesville, Virginia, USA There has been a great explosion of biological data and
More informationMISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE
MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE Wala Abedalkhader and Noora Abdulrahman Department of Engineering Systems and Management, Masdar Institute of Science and Technology, Abu Dhabi, United
More informationFeature Selection of Gene Expression Data for Cancer Classification: A Review
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 50 (2015 ) 52 57 2nd International Symposium on Big Data and Cloud Computing (ISBCC 15) Feature Selection of Gene Expression
More informationGene Expression Data Analysis
Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based
More informationWeighted Top Score Pair Method for Gene Selection and Classification
Weighted Top Score Pair Method for Gene Selection and Classification Huaien Luo 1, Yuliansa Sudibyo 2,, Lance D. Miller 1, and R. Krishna Murthy Karuturi 1, 1 Genome Institute of Singapore, Singapore 2
More informationA Comparative Study of Microarray Data Analysis for Cancer Classification
A Comparative Study of Microarray Data Analysis for Cancer Classification Kshipra Chitode Research Student Government College of Engineering Aurangabad, India Meghana Nagori Asst. Professor, CSE Dept Government
More informationMachine Learning Models for Classification of Lung Cancer and Selection of Genomic Markers Using Array Gene Expression Data
Machine Learning Models for Classification of Lung Cancer and Selection of Genomic Markers Using Array Gene Expression Data C.F. Aliferis 1, I. Tsamardinos 1, P.P. Massion 2, A. Statnikov 1, N. Fananapazir
More informationAn Efficient and Effective Immune Based Classifier
Journal of Computer Science 7 (2): 148-153, 2011 ISSN 1549-3636 2011 Science Publications An Efficient and Effective Immune Based Classifier Shahram Golzari, Shyamala Doraisamy, Md Nasir Sulaiman and Nur
More informationBioinformatics for Biologists
Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis
More informationCopyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT
ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ANALYTICAL MODEL DEVELOPMENT AGENDA Enterprise Miner: Analytical Model Development The session looks at: - Supervised and Unsupervised Modelling - Classification
More informationPredicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest
Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest 1. Introduction Reddit is a social media website where users submit content to a public forum, and other
More informationAnalysis of a Proposed Universal Fingerprint Microarray
Analysis of a Proposed Universal Fingerprint Microarray Michael Doran, Raffaella Settimi, Daniela Raicu, Jacob Furst School of CTI, DePaul University, Chicago, IL Mathew Schipma, Darrell Chandler Bio-detection
More informationGene Selection in Cancer Classification using PSO/SVM and GA/SVM Hybrid Algorithms
Laboratoire d Informatique Fondamentale de Lille Gene Selection in Cancer Classification using PSO/SVM and GA/SVM Hybrid Algorithms Enrique Alba, José GarcíaNieto, Laetitia Jourdan and ElGhazali Talbi
More informationARTIFICIAL IMMUNE SYSTEM CLASSIFICATION OF MULTIPLE- CLASS PROBLEMS
1 ARTIFICIAL IMMUNE SYSTEM CLASSIFICATION OF MULTIPLE- CLASS PROBLEMS DONALD E. GOODMAN, JR. Mississippi State University Department of Psychology Mississippi State, Mississippi LOIS C. BOGGESS Mississippi
More informationA comparison of Multiple Biomarker Selection Algorithms for Early Screening of Ovarian Cancer
A comparison of Multiple Biomarker Selection Algorithms for Early Screening of Ovarian Cancer Yu-Seop Kim 1,3, Jong-Dae Kim 1,3, Min-Ki Jang 2,3, Chan-Young Park 1,3, and Hye-Jung Song 1,3 1 Dept. of Ubiquitous
More informationReliable classification of two-class cancer data using evolutionary algorithms
BioSystems 72 (23) 111 129 Reliable classification of two-class cancer data using evolutionary algorithms Kalyanmoy Deb, A. Raji Reddy Kanpur Genetic Algorithms Laboratory (KanGAL), Indian Institute of
More informationIntroduction to Bioinformatics. Fabian Hoti 6.10.
Introduction to Bioinformatics Fabian Hoti 6.10. Analysis of Microarray Data Introduction Different types of microarrays Experiment Design Data Normalization Feature selection/extraction Clustering Introduction
More informationClassification of DNA Sequences Using Convolutional Neural Network Approach
UTM Computing Proceedings Innovations in Computing Technology and Applications Volume 2 Year: 2017 ISBN: 978-967-0194-95-0 1 Classification of DNA Sequences Using Convolutional Neural Network Approach
More informationMicroarrays & Gene Expression Analysis
Microarrays & Gene Expression Analysis Contents DNA microarray technique Why measure gene expression Clustering algorithms Relation to Cancer SAGE SBH Sequencing By Hybridization DNA Microarrays 1. Developed
More informationMachine Learning Methods for Microarray Data Analysis
Harvard-MIT Division of Health Sciences and Technology HST.512: Genomic Medicine Prof. Marco F. Ramoni Machine Learning Methods for Microarray Data Analysis Marco F. Ramoni Children s Hospital Informatics
More informationLymphoma Cancer Classification Using Genetic Programming with SNR Features
Lymphoma Cancer Classification Using Genetic Programming with SNR Features JinHyuk Hong and SungBae Cho Dept. of Computer Science, Yonsei University, 134 Shinchondong, Sudaemoonku, Seoul 120749, Korea
More informationModeling gene expression data via positive Boolean functions
Modeling gene expression data via positive Boolean functions Francesca Ruffino 1, Marco Muselli 2, Giorgio Valentini 1 1 DSI, Dipartimento di Scienze dell Informazione, Università degli Studi di Milano,
More informationEstimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression Data
2011 International Conference on Information and Electronics Engineering IPCSIT vol.6 (2011) (2011) IACSIT Press, Singapore Estimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression
More informationMachine Learning in Computational Biology CSC 2431
Machine Learning in Computational Biology CSC 2431 Lecture 9: Combining biological datasets Instructor: Anna Goldenberg What kind of data integration is there? What kind of data integration is there? SNPs
More informationData Mining and Applications in Genomics
Data Mining and Applications in Genomics Lecture Notes in Electrical Engineering Volume 25 For other titles published in this series, go to www.springer.com/series/7818 Sio-Iong Ao Data Mining and Applications
More informationFinding Regularity in Protein Secondary Structures using a Cluster-based Genetic Algorithm
Finding Regularity in Protein Secondary Structures using a Cluster-based Genetic Algorithm Yen-Wei Chu 1,3, Chuen-Tsai Sun 3, Chung-Yuan Huang 2,3 1) Department of Information Management 2) Department
More informationMicroarray gene expression ranking with Z-score for Cancer Classification
Microarray gene expression ranking with Z-score for Cancer Classification M.Yasodha, Research Scholar Government Arts College, Coimbatore, Tamil Nadu, India Dr P Ponmuthuramalingam Head and Associate Professor
More informationThis place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.
G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic
More informationA Comparative Study of Feature Selection and Classification Methods for Gene Expression Data
A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data Thesis by Heba Abusamra In Partial Fulfillment of the Requirements For the Degree of Master of Science King
More informationMolecular Diagnosis Tumor classification by SVM and PAM
Molecular Diagnosis Tumor classification by SVM and PAM Florian Markowetz and Rainer Spang Practical DNA Microarray Analysis Berlin, Nov 2003 Max-Planck-Institute for Molecular Genetics Dept. Computational
More informationISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 7, Issue 11, May 2018
Application of Machine Learning to Immune Disease Prediction Kuan-Hui Lin, Yuh-Jyh Hu College of Computer Science, National Chiao Tung University, Hsinchu, Taiwan Abstract The intrusion of viruses, germs
More informationParticle Swarm Feature Selection for Microarray Leukemia Classification
2 (2017) 1-8 Progress in Energy and Environment Journal homepage: http://www.akademiabaru.com/progee.html ISSN: 2600-7762 Particle Swarm Feature Selection for Microarray Leukemia Classification Research
More informationAmit Kumar Nandanwar A.P. CSE Department, VNS College, Bhopal, India
Volume 6, Issue 4, April 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Support Classifier
More informationA Literature Review of Predicting Cancer Disease Using Modified ID3 Algorithm
A Literature Review of Predicting Cancer Disease Using Modified ID3 Algorithm Mr.A.Deivendran 1, Ms.K.Yemuna Rane M.Sc., M.Phil 2., 1 M.Phil Research Scholar, Dept of Computer Science, Kongunadu Arts and
More informationSAS Microarray Solution for the Analysis of Microarray Data. Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG
for the Analysis of Microarray Data Susanne Schwenke, Schering AG Dr. Richardus Vonk, Schering AG Overview Challenges in Microarray Data Analysis Software for Microarray Data Analysis SAS Scientific Discovery
More informationTitle: Genome-Wide Predictions of Transcription Factor Binding Events using Multi- Dimensional Genomic and Epigenomic Features Background
Title: Genome-Wide Predictions of Transcription Factor Binding Events using Multi- Dimensional Genomic and Epigenomic Features Team members: David Moskowitz and Emily Tsang Background Transcription factors
More informationImproving Credit Card Fraud Detection using a Meta- Classification Strategy
Improving Credit Card Fraud Detection using a Meta- Classification Strategy Joseph Pun, Yuri Lawryshyn Department of Applied Chemistry and Engineering, University of Toronto Toronto ABSTRACT One of the
More informationImmune Network based Ensembles
Immune Network based Ensembles Nicolás García-Pedrajas 1 and Colin Fyfe 2 1- Dept. of Computing and Numerical Analysis University of Córdoba (SPAIN) e-mail: npedrajas@uco.es 2- the Dept. of Computing University
More informationTumor Gene Characteristics Selection Method Based on Multi-Agent
Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 2513-2518 2513 Open Access Tumor Gene Characteristics Selection Method Based on Multi-Agent Yang
More informationGeneration of Comprehensible Hypotheses from Gene Expression Data
Generation of Comprehensible Hypotheses from Gene Expression Data Yuan Jiang, Ming Li, and Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {jiangy,lim,zhouzh}@lamda.nju.edu.cn
More informationAUTOMATIC CANCER DIAGNOSTIC DECISION SUPPORT SYSTEM FOR GENE EXPRESSION DOMAIN. Alexander Statnikov. Thesis. Submitted to the Faculty of the
AUTOMATIC CANCER DIAGNOSTIC DECISION SUPPORT SYSTEM FOR GENE EXPRESSION DOMAIN By Alexander Statnikov Thesis Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment
More informationSupport Vector Machines (SVMs) for the classification of microarray data. Basel Computational Biology Conference, March 2004 Guido Steiner
Support Vector Machines (SVMs) for the classification of microarray data Basel Computational Biology Conference, March 2004 Guido Steiner Overview Classification problems in machine learning context Complications
More informationSOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM
SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM Abbas Heiat, College of Business, Montana State University-Billings, Billings, MT 59101, 406-657-1627, aheiat@msubillings.edu ABSTRACT CRT and ANN
More informationA Gene Selection Algorithm using Bayesian Classification Approach
American Journal of Applied Sciences 9 (1): 127-131, 2012 ISSN 1546-9239 2012 Science Publications A Gene Selection Algorithm using Bayesian Classification Approach 1, 2 Alo Sharma and 2 Kuldip K. Paliwal
More information2. Materials and Methods
Identification of cancer-relevant Variations in a Novel Human Genome Sequence Robert Bruggner, Amir Ghazvinian 1, & Lekan Wang 1 CS229 Final Report, Fall 2009 1. Introduction Cancer affects people of all
More informationApplication of Emerging Patterns for Multi-source Bio-Data Classification and Analysis
Application of Emerging Patterns for Multi-source Bio-Data Classification and Analysis Hye-Sung Yoon 1, Sang-Ho Lee 1,andJuHanKim 2 1 Ewha Womans University, Department of Computer Science and Engineering,
More informationIMPROVED GENE SELECTION FOR CLASSIFICATION OF MICROARRAYS
IMPROVED GENE SELECTION FOR CLASSIFICATION OF MICROARRAYS J. JAEGER *, R. SENGUPTA *, W.L. RUZZO * * Department of Computer Science & Engineering University of Washington 114 Sieg Hall, Box 352350 Seattle,
More informationPerformance Analysis of Genetic Algorithm with knn and SVM for Feature Selection in Tumor Classification
Performance Analysis of Genetic Algorithm with knn and SVM for Feature Selection in Tumor Classification C. Gunavathi, K. Premalatha Abstract Tumor classification is a key area of research in the field
More informationDiscriminant models for high-throughput proteomics mass spectrometer data
Proteomics 2003, 3, 1699 1703 DOI 10.1002/pmic.200300518 1699 Short Communication Parul V. Purohit David M. Rocke Center for Image Processing and Integrated Computing, University of California, Davis,
More informationBiomedical Big Data and Precision Medicine
Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types
More informationarxiv: v1 [cs.ai] 5 Jun 2010
Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data arxiv:1006.1030v1 [cs.ai] 5 Jun 2010 Andrej Kastrin, Borut Peterlin Institute of
More informationNeural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University
Neural Networks and Applications in Bioinformatics Yuzhen Ye School of Informatics and Computing, Indiana University Contents Biological problem: promoter modeling Basics of neural networks Perceptrons
More informationGenetic Algorithm with Upgrading Operator
Genetic Algorithm with Upgrading Operator NIDAPAN SUREERATTANAN Computer Science and Information Management, School of Advanced Technologies, Asian Institute of Technology, P.O. Box 4, Klong Luang, Pathumthani
More informationChapter 3 Top Scoring Pair Decision Tree for Gene Expression Data Analysis
Chapter 3 Top Scoring Pair Decision Tree for Gene Expression Data Analysis Marcin Czajkowski and Marek Krȩtowski Abstract Classification problems of microarray data may be successfully performed with approaches
More informationNeural Networks and Applications in Bioinformatics
Contents Neural Networks and Applications in Bioinformatics Yuzhen Ye School of Informatics and Computing, Indiana University Biological problem: promoter modeling Basics of neural networks Perceptrons
More informationClassification in Parkinson s disease. ABDBM (c) Ron Shamir
Classification in Parkinson s disease 1 Parkinson s Disease The 2nd most common neurodegenerative disorder Impairs motor skills, speech, smell, cognition 1-3 sick per 1 >1% in individuals aged above 7
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru
More informationAn Implementation of genetic algorithm based feature selection approach over medical datasets
An Implementation of genetic algorithm based feature selection approach over medical s Dr. A. Shaik Abdul Khadir #1, K. Mohamed Amanullah #2 #1 Research Department of Computer Science, KhadirMohideen College,
More informationInternational Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0047 ISSN (Online): 2279-0055 International
More informationMetamodelling and optimization of copper flash smelting process
Metamodelling and optimization of copper flash smelting process Marcin Gulik mgulik21@gmail.com Jan Kusiak kusiak@agh.edu.pl Paweł Morkisz morkiszp@agh.edu.pl Wojciech Pietrucha wojtekpietrucha@gmail.com
More information