Classification and Learning Using Genetic Algorithms

Similar documents
Classification and Learning Using Genetic Algorithms

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

Introduction to Bioinformatics

Evolutionary Computation. Lecture 1 January, 2007 Ivan Garibay

Introduction to BIOINFORMATICS

Biology 644: Bioinformatics

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

Chapter 8 Data Analysis, Modelling and Knowledge Discovery in Bioinformatics

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Data Mining and Applications in Genomics

Changing Mutation Operator of Genetic Algorithms for optimizing Multiple Sequence Alignment

Introduction to Algorithms in Computational Biology Lecture 1

VALLIAMMAI ENGINEERING COLLEGE

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Finding Regularity in Protein Secondary Structures using a Cluster-based Genetic Algorithm

Hidden Markov Models. Some applications in bioinformatics

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

Part 1: Motivation, Basic Concepts, Algorithms

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

Chapter 6 Evolutionary Computation and Evolving Connectionist Systems

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

Intelligence and. Vivek Kaie

ELE4120 Bioinformatics. Tutorial 5

BIOINFORMATICS Introduction

Genetics and Bioinformatics

Examination Assignments

Introduction to Bioinformatics

PARALLEL LINE AND MACHINE JOB SCHEDULING USING GENETIC ALGORITHM

What is Evolutionary Computation? Genetic Algorithms. Components of Evolutionary Computing. The Argument. When changes occur...

Textbook Reading Guidelines

Identifying Splice Sites Of Messenger RNA Using Support Vector Machines

Computational Methods for Protein Structure Prediction and Fold Recognition... 1 I. Cymerman, M. Feder, M. PawŁowski, M.A. Kurowski, J.M.

Genetic Algorithm: An Optimization Technique Concept

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview

Study on the Application of Data Mining in Bioinformatics. Mingyang Yuan

Genetic Algorithm for Variable Selection. Genetic Algorithms Step by Step. Genetic Algorithm (Holland) Flowchart of GA

Introduction. CS482/682 Computational Techniques in Biological Sequence Analysis

GENETIC ALGORITHMS. Narra Priyanka. K.Naga Sowjanya. Vasavi College of Engineering. Ibrahimbahg,Hyderabad.

Feature Selection of Gene Expression Data for Cancer Classification: A Review

Contents. Preface...VII

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

Analysis of Molecular Data Using Statistical and Evolutionary Approaches. Kevin Atteson. and

Computational Intelligence Lecture 20:Intorcution to Genetic Algorithm

Neural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University

Genome 373: Genomic Informatics. Elhanan Borenstein

Microarrays & Gene Expression Analysis

Data Mining for Biological Data Analysis

Neural Networks and Applications in Bioinformatics

DNA Gene Expression Classification with Ensemble Classifiers Optimized by Speciated Genetic Algorithm

Performance Analysis of Multi Clustered Parallel Genetic Algorithm with Gray Value

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

Scoring Alignments. Genome 373 Genomic Informatics Elhanan Borenstein

Genetic Algorithms and Genetic Programming. Lecture 1: Introduction (25/9/09)

Engineering Genetic Circuits

Creation of a PAM matrix

Introduction to Bioinformatics Finish. Johannes Starlinger

10. Lecture Stochastic Optimization

BIOINFORMATICS THE MACHINE LEARNING APPROACH

College of information technology Department of software

Plan for today GENETIC ALGORITHMS. Randomised search. Terminology: The GA cycle. Decoding genotypes

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

An Analytical Upper Bound on the Minimum Number of. Recombinations in the History of SNP Sequences in Populations

MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305

BIOINFORMATICS TO ANALYZE AND COMPARE GENOMES

Computational methods in bioinformatics: Lecture 1

Inferring Gene Networks from Microarray Data using a Hybrid GA p.1

Genetic Algorithms. Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna.

ABSTRACT COMPUTER EVOLUTION OF GENE CIRCUITS FOR CELL- EMBEDDED COMPUTATION, BIOTECHNOLOGY AND AS A MODEL FOR EVOLUTIONARY COMPUTATION

Bioinformatics : Gene Expression Data Analysis

Introduction to Genetic Algorithm (GA) Presented By: Rabiya Khalid Department of Computer Science

An Evolutionary Optimization for Multiple Sequence Alignment

Computers in Biology and Bioinformatics

Protein Domain Boundary Prediction from Residue Sequence Alone using Bayesian Neural Networks

Lecture 7 Motif Databases and Gene Finding

Evolutionary Computation

Evolutionary Algorithms for Multiobjective Optimization

Biology Evolution: Mutation I Science and Mathematics Education Research Group

Practical Bioinformatics for Biologists (BIOS 441/641)

Data Warehousing. and Data Mining. Gauravkumarsingh Gaharwar

Selecting an Optimal Compound of a University Research Team by Using Genetic Algorithms

Computational Genomics ( )

The RNA tools registry

Teaching Bioinformatics in the High School Classroom. Models for Disease. Why teach bioinformatics in high school?

Textbook Reading Guidelines

Genetic Algorithms for Optimizations

Practical Bioinformatics for Biologists (BIOS493/700)

Motivation From Protein to Gene

MATH 5610, Computational Biology

NGS Approaches to Epigenomics

Cover Page. The handle holds various files of this Leiden University dissertation.

Genome Sequence Assembly

Introduction to Artificial Intelligence. Prof. Inkyu Moon Dept. of Robotics Engineering, DGIST

COPYRIGHTED MATERIAL. Contents. Part One Requirements, Realities, and Architecture 1. Acknowledgments Introduction

CSC 121 Computers and Scientific Thinking

Bioinformatics and Genomics: A New SP Frontier?

Introduction to Bioinformatics

Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized

Evolutionary Computation

Transcription:

Sanghamitra Bandyopadhyay Sankar K. Pal Classification and Learning Using Genetic Algorithms Applications in Bioinformatics and Web Intelligence With 87 Figures and 43 Tables 4y Spri rineer

1 Introduction 1 1.1 Introduction 1 1.2 Machine Recognition of Patterns: Preliminaries 3 1.2.1 Data Acquisition 4 1.2.2 Feature Selection 5 1.2.3 Classification 6 1.2.4 Clustering 8 1.3 Different Approaches 9 1.4 Connectionist Approach: Relevance and Features 11 1.5 Genetic Approach: Relevance and Features 13 1.6 Fuzzy Set-Theoretic Approach: Relevance and Features 14 1.7 Other Approaches 15 1.8 Applications of Pattern Recognition and Learning 16 1.9 Summary and Scope of the Book 17 2 Genetic Algorithms 19 2.1 Introduction 19 2.2 Traditional Versus Nontraditional Search 19 2.3 Overview of Genetic Algorithms 21 2.3.1 Basic Principles and Features 21 2.3.2 Encoding Strategy and Population 22 2.3.3 Evaluation 24 2.3.4 Genetic Operators 24 2.3.5 Parameters of Genetic Algorithms 27 2.3.6 Schema Theorem 27 2.4 Proof of Convergence of GAs 29 2.4.1 Markov Chain Modelling of GAs 29 2.4.2 Limiting Behavior of Elitist Model of GAs 31 2.5 Some Implementation Issues in GAs 35 2.6 Multiobjective Genetic Algorithms 40 2.7 Applications of Genetic Algorithms 46

XII Contents 2.8 Summary 51 3 Supervised Classification Using Genetic Algorithms 53 3.1 Introduction 53 3.2 Genetic Algorithms for Generating Fuzzy If-Then Rules 54 3.3 Genetic Algorithms and Decision Trees 57 3.4 GA-classifier: Genetic Algorithm for Generation of Class Boundaries 60 3.4.1 Principle of Hyperplane Fitting 61 3.4.2 Region Identification and Fitness Computation 62 3.4.3 Genetic Operations 65 3.5 Experimental Results 65 3.5.1 Results 69 3.5.2 Consideration of Higher-Order Surfaces 75 3.6 Summary 78 4 Theoretical Analysis of the GA-classifier 81 4.1 Introduction 81 4.2 Relationship with Bayes' Error Probability 82 4.3 Relationship Between H opt and HGA 88 4.3.1 Obtaining H GA from H 88 4.3.2 How H GA Is Related to H opt 89 4.3.3 Some Points Related to n and H 89 4.4 Experimental Results 90 4.4.1 Data Sets 91 4.4.2 Learning the Class Boundaries and Performance on Test Data 93 4.4.3 Variation of Recognition Scores with P\ 104 4.5 Summary 106 5 Variable String Lengths in GA-classifier 109 5.1 Introduction 109 5.2 Genetic Algorithm with Variable String Length and the Classification Criteria 110 5.3 Description of VGA-Classifier 111 5.3.1 Chromosome Representation and Population Initialization 111 5.3.2 Fitness Computation 113 5.3.3 Genetic Operators 114 5.4 Theoretical Study of VGA-classifier 117 5.4.1 Issues of Minimum miss and H 117 5.4.2 Error Rate 118 5.5 Experimental Results 119 5.5.1 Data Sets 119 5.5.2 Results 120

XIII 5.6 VGA-dassifier for the Design of a Multilayer Perceptron 124 5.6.1 Analogy Between Multilayer Perceptron and VGA-dassifier 124 5.6.2 Deriving the MLP Architecture and the Connection Weights 125 5.6.3 Postprocessing Step 129 5.6.4 Experimental Results 131 5.7 Summary 132 6 Chromosome Differentiation in VGA-dassifier 139 6.1 Introduction 139 6.2 GACD: Incorporating Chromosome Differentiation in GA... 140 6.2.1 Motivation 140 6.2.2 Description of GACD 140 6.3 Schema Theorem for GACD 143 6.3.1 Terminology 143 6.3.2 Analysis of GACD 143 6.4 VGACD-classifier. Incorporation of Chromosome Differentiation in VGA-dassifier 148 6.4.1 Population Initialization 149 6.4.2 Fitness Computation and Genetic Operators 150 6.5 Pixel Classification of Remotely Sensed Image 150 6.5.1 Relevance of GA 150 6.5.2 Experimental Results 150 6.6 Summary 154 7 Multiobjective VGA-classifier and Quantitative Indices... 159 7.1 Introduction 159 7.2 Multiobjective Optimization 160 7.3 Relevance of Multiobjective Optimization 161 7.4 Multiobjective GA-Based Classifier 162 7.4.1 Chromosome Representation 162 7.4.2 Fitness Computation 162 7.4.3 Selection 163 7.4.4 Crossover 164 7.4.5 Mutation 164 7.4.6 Incorporating Elitism 165 7.4.7 PAES-classifier. The Classifier Based on Pareto Archived Evolution Strategy 167 7.5 Validation and Testing 169 7.6 Indices for Comparing MO Solutions 170 7.6.1 Measures Based on Position of Nondominated Front... 170 7.6.2 Measures Based on Diversity of the Solutions 171 7.7 Experimental Results 172 7.7.1 Parameter Values 173

7.7.2 Comparison of Classification Performance 173 7.8 Summary 179 Genetic Algorithms in Clustering 181 8.1 Introduction 181 8.2 Basic Concepts and Preliminary Definitions 182 8.3 Clustering Algorithms 184 8.3.1 K-Means Clustering Algorithm 184 8.3.2 Single-Linkage Clustering Algorithm 185 8.3.3 Fuzzy c-means Clustering Algorithm 186 8.4 Clustering Using GAs: Fixed Number of Crisp Clusters 187 8.4.1 Encoding Strategy 188 8.4.2 Population Initialization 188 8.4.3 Fitness Computation 188 8.4.4 Genetic Operators 189 8.4.5 Experimental Results 189 8.5 Clustering Using GAs: Variable Number of Crisp Clusters... 192 8.5.1 Encoding Strategy and Population Initialization 192 8.5.2 Fitness Computation 193 8.5.3 Genetic Operators 193 8.5.4 Some Cluster Validity Indices 194 8.5.5 Experimental Results 196 8.6 Clustering Using GAs: Variable Number of Fuzzy Clusters... 205 8.6.1 Fitness Computation 205 8.6.2 Experimental Results 206 8.7 Summary 212 Genetic Learning in Bioinformatics 213 9.1 Introduction 213 9.2 Bioinformatics: Concepts and Features 214 9.2.1 Basic Concepts of Cell Biology 214 9.2.2 Different Bioinformatics Tasks 216 9.3 Relevance of Genetic Algorithms in Bioinformatics 216 9.4 Bioinformatics Tasks and Application of GAs 220 9.4.1 Alignment and Comparison of DNA, RNA and Protein Sequences 220 9.4.2 Gene Mapping on Chromosomes 223 9.4.3 Gene Finding and Promoter Identification from DNA Sequences 224 9.4.4 Interpretation of Gene Expression and Microarray Data 226 9.4.5 Gene Regulatory Network Identification 227 9.4.6 Construction of Phylogenetic Trees for Studying Evolutionary Relationship 228 9.4.7 DNA Structure Prediction 229 9.4.8 RNA Structure Prediction 231

XV 9.4.9 Protein Structure Prediction and Classification 233 9.4.10 Molecular Design and Docking 236 9.5 Experimental Results 238 9.6 Summary 239 10 Genetic Algorithms and Web Intelligence 243 10.1 Introduction 243 10.2 Web Mining 244 10.2.1 Web Mining Components and Methodologies 246 10.2.2 Web Mining Categories 246 10.2.3 Challenges and Limitations in Web Mining 248 10.3 Genetic Algorithms in Web Mining 250 10.3.1 Search and Retrieval 250 10.3.2 Query Optimization and Reformulation 252 10.3.3 Document Representation and Personalization 254 10.3.4 Distributed Mining 254 10.4 Summary 255 A e-optimal Stopping Time for GAs 257 A.l Introduction 257 A.2 Foundation 257 A.3 Fitness Function 259 A.4 Upper Bound for Optimal Stopping Time 261 A.5 Mutation Probability and e-optimal Stopping Time 264 B Data Sets Used for the Experiments 269 C Variation of Error Probability with P 1 275 References 277 Index 309