A comparison of Multiple Biomarker Selection Algorithms for Early Screening of Ovarian Cancer

Similar documents
MATLAB-Based Multi-Marker Data Analysis System for Early Detection of Ovarian Cancer

Calibration System and Calibration Method of Biochip Temperature Sensors

Chamber Temperature Measurement of Micro PCR Chip Using Thermocouple

Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data

Dallas J. Elgin, Ph.D. IMPAQ International Randi Walters, Ph.D. Casey Family Programs APPAM Fall Research Conference

Micro-PCR Chip NTC-Thermistor Sensor-Calibration System

The interactions that occur between two proteins are essential parts of biological systems.

A Literature Review of Predicting Cancer Disease Using Modified ID3 Algorithm

MISSING DATA CLASSIFICATION OF CHRONIC KIDNEY DISEASE

Feature Selection of Gene Expression Data for Cancer Classification: A Review

Smart India Hackathon

QUANTITATIVE DETERMINATION OF HUMAN EPIDIDYMIS PROTEIN 4

1. Klenk H.-P. and colleagues published the complete genome sequence of the organism Archaeoglobus fulgidus in Nature. [ Nature 390: (1997)].

Using Genomics to Guide Immunosuppression Therapy David A. Baran, MD, FACC, FSCAI System Director, Advanced HF, Transplant and MCS, Sentara Heart

Trend in diagnostic biochip. Eiichiro Ichiishi, M.D., Ph.D. Department of Internal Medicine International University of Health and Welfare Hospital

BIOINFORMATICS THE MACHINE LEARNING APPROACH

Technical Guidance on Development of In Vitro Companion Diagnostics and Corresponding Therapeutic Products

High Resolution GC-MS Application: Metabolomics Vladimir Tolstikov, PhD

ASIAN PATENT ATTORNEYS ASSOCIATION Recognized Group of Korea. Report to Emerging IP Rights Committee 2012, Chiang Mai

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification

UAMS ADVANCED DIAGNOSTICS FOR ADVANCING CURE

The use of sensitivity analysis to set acceptance criteria in the validation of biomarker assays. Graham Healey, Chief Statistician, Oncimmune, UK

Evaluation and Regulation of Biomarkers A Public Health Perspective

3DCNN for False Positive Reduction in Lung Nodule Detection

Modelli predittivi in radioterapia: modelli statistici vs Machine Learning

Data Mining for Biological Data Analysis

Classification of DNA Sequences Using Convolutional Neural Network Approach

Supervised Learning from Micro-Array Data: Datamining with Care

Syllabus for BIOS 101, SPRING 2013

AUTOMATED INTERPRETABLE COMPUTATIONAL BIOLOGY IN THE CLINIC: A FRAMEWORK TO PREDICT DISEASE SEVERITY AND STRATIFY PATIENTS FROM CLINICAL DATA

Pharmacogenetics of Drug-Induced Side Effects

EE 45X Biomedical Nanotechnology. Course Proposal

Outline. Impact on Human Diseases Basis for Molecular Assay Novel Biomarkers

Flow Cytometry In Clinical Diagnosis

2017 Qualifying Examination

Recent advances in tissue-engineered corneal regeneration

Proteomic Pattern Recognition

Preanalytical Variables in Blood Collection: Impact on Precision Medicine

Improving the Limit of Detection of Lateral Flow Assays using 3DNA Technology

Our view on cdna chip analysis from engineering informatics standpoint

Proteomics Background and clinical utility

About OMICS Group Conferences

From Liquid Biopsy and FFPE Samples to Results

Protein/Cytokine Array Services

Diagnostic Accuracy of Two Faecal Calprotectin Immunoassays

SPM 8.2. Salford Predictive Modeler

From Liquid Biopsy and FFPE Samples to Results

Classification of Cervical Cancer Dataset

A Protein Secondary Structure Prediction Method Based on BP Neural Network Ru-xi YIN, Li-zhen LIU*, Wei SONG, Xin-lei ZHAO and Chao DU

DIAGNOSE AND TREAT WITH ANTIBODIES THAT RECOGNIZE NATIVE HUMAN PROTEIN EPITOPES IN BLOOD AND TISSUE

Cancer Informatics. Comprehensive Evaluation of Composite Gene Features in Cancer Outcome Prediction

Drug Discovery in Chemoinformatics Using Adaptive Neuro-Fuzzy Inference System

Results: Conventional ELISA results demonstrated that sera from the experimentally infected mice contained higher antibody titers vs.

HLA antibody detection and kidney allocation within Eurotransplant

Introduction. CS482/682 Computational Techniques in Biological Sequence Analysis

Disentangling Prognostic and Predictive Biomarkers Through Mutual Information

Appendix B. The Curriculum in Cytotechnology for Entry-level Competencies

3DCNN for Lung Nodule Detection And False Positive Reduction

Random forest for gene selection and microarray data classification

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

Matrix Factorization-Based Data Fusion for Drug-Induced Liver Injury Prediction

Bio-Plex suspension array system

APPLICATION NOTE. MagMAX mirvana Total RNA Isolation Kit TaqMan Advanced mirna Assays

Discriminant models for high-throughput proteomics mass spectrometer data

COS 597c: Topics in Computational Molecular Biology. DNA arrays. Background

Classification in Parkinson s disease. ABDBM (c) Ron Shamir

Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development

Advancing Manufacturing for Advanced Therapies

REPEATABILITY, REPRODUCIBILITY AND ANALYTIC STANDARDS FOR BIOMARKER DEVELOPMENT

Bio-Plex suspension array system

Process and guidelines for applications for approval of trials involving gene and other biotechnology therapies. November 2015

Practical Applications of Immunology (Chapter 18) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College Eastern Campus

Quantum Dots and Carbon Nanotubes in Cancer diagnose EE453 Project Report submitted by Makram Abd El Qader

Specimen by Immunoassay

An Empirical Study on the Relevance of Renewable Energy Forecasting Methods

REVIEW ON PREDICTION OF CHRONIC KIDNEY DISEASE USING DATA MINING TECHNIQUES

Gene expression connectivity mapping and its application to Cat-App

A Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG 1, Hong-Yu FU 1

UK Biobank Past, Present, and Future

Today. Last time. Lecture 5: Discrimination (cont) Jane Fridlyand. Oct 13, 2005

Effective Use of Combination of Lung Malignity Markers

Multi-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer

Stealing Fire: A Retrospective Survey of Biotechnology Patent Claims in the Wake of the U.S. Supreme Court s Mayo v. Prometheus Decision

Efforts to Improve Data Transparency at the JBC

Application of Decision Trees in Mining High-Value Credit Card Customers

Integrating natural language processing and machine learning algorithms to categorize oncologic response in radiology reports

DepoVax TM : A novel delivery formulation for cancer immunotherapy and infectious disease vaccines

DISCOVERY AND VALIDATION OF TARGETS AND BIOMARKERS BY MASS SPECTROMETRY-BASED PROTEOMICS. September, 2011

Cytomics in Action: Cytokine Network Cytometry

Development, Optimization and Validation of Luminex based cytokine assays

Peptide libraries: applications, design options and considerations. Laura Geuss, PhD May 5, 2015, 2:00-3:00 pm EST

Biomarkers in Biochips and Microarrays: Innovative Technologies, Growth Opportunities and Future Market Outlook

- OMICS IN PERSONALISED MEDICINE

CA125 (Human) ELISA Kit

DNA Studies (Genetic Studies) Disclosure Statement. Definition of Genomics. Outline. Immediate Present and Near Future

Data Mining to Aid Beam Angle Selection for IMRT

Better Marketing Analytics Using Genetic Algorithms. Doug Newell Genalytics Inc. June 2005

3 Ways to Improve Your Targeted Marketing with Analytics

DNA Gene Expression Classification with Ensemble Classifiers Optimized by Speciated Genetic Algorithm

Transcription:

A comparison of Multiple Biomarker Selection Algorithms for Early Screening of Ovarian Cancer Yu-Seop Kim 1,3, Jong-Dae Kim 1,3, Min-Ki Jang 2,3, Chan-Young Park 1,3, and Hye-Jung Song 1,3 1 Dept. of Ubiquitous Computing, Hallym University, 1 Hallymdaehak-gil, Chuncheon, Gangwon-do, 200-702 Korea { yskim01, kimjd cypark, hjsong}@hallym.ac.kr 2 Dept of Computer Engineering, Hallym University, 1 Hallymdaehak-gil, Chuncheon, Gangwon-do, 200-702 Korea percussive@gmail.com 3 Bio-IT Research Center, Hallym University, 1 Hallymdaehak-gil, Chuncheon, Gangwon-do, 200-702 Korea Abstract. This paper demonstrates and compares the classification performance from cancer to normal under Luminex exposed environment of the optimal biomarker combinations that can diagnose ovarian cancer. The optimal combinations having high resolutions were determined from T-Test, Genetic Algorithm, and Random Forest. Each selected combinations sensitivity, specificity, and accuracy were compared by Linear Discriminant Analysis (LDA), k-nearest Neighbor (k-nn), and Logistic Regression. The 21 biomarker data used in this experiment was obtained through Luminex-PRA from the serum of 65 patients (caner 27, normal 38) of two hospitals. In this study, the results showed the selecting 2 to 3 marker with Genetic Algorithm and categorizing themed with LDA, it shows the closet sensitivity, specificity, and accuracy similar to that of the results obtained through complete enumeration of the combination of 2-4 markers. Keywords: Biomarker, Ovarian Cancer, Marker, T-Test, Genetic Algorithm, Random Forest, LDA, Logistic Regression 1 Introduction Ovarian cancer is a malignant tumor frequently arising in the age between 50-70. Early diagnosis is associated with a 92% 5-year survival rate, yet only 19% of ovarian cancers are detected early [1]. Therefore, early detection of ovarian cancer has greate promise to improve clinical outcome. It is evident that the development of a biomarker for early detection of the ovarian cancer has become paramount [2]. Biomarker consists of molecular information based on the pattern of a single or multiple molecules originating from DNA, metabolite, or protein. Biomarkers are indicators that can detect the physical change of an organism due to the genetic after required genetic change. Along with the completion of the genome project, various biomarkers are being developed, providing critical clues for cancers and senile disorders. 162

Fig. 1. Changes in gynecologic cancer causes in Korea (1999-2007) The early stages of research focused on a single biomarker for cancer diagnosis. Recent researches focus on combining multiple biomarkers to diagnose cancer more efficiently. Researches tend to focus especially on improving the sensitivity and specificity in order to increase the accuracy of the diagnosis, and the commercialization of multi-biomarkers seems to be close at hand. However, a new technology to find the right biomarker combinations is required, since the sensitivity and quantity has not yet reached a satisfactory level [3]. In this research, the mass value of the biomarkers was obtained using Luminex [4]. Luminex follows the panel reactive antibody (PRA), a solid phase-based method of Luminex corp. Luminex-PRA reacts the human leucocyte antigen (HLA) marker attached Luminex-beads to the HLA antibodies in the serum. Then detects the florescence of the antibodies from the beads by utilizing a exclusive equipment and program [5]. This paper determines the optimal marker combinations for ovarian cancer diagnosis from the combinations selected with T-Test [6], Genetic Algorithm [7], and Random Forest[8] from all the possible combinations of the 21 markers, based on their the florescence data measured. Linear Discriminant Analysis[6], k-nearest Neighbor[7], and Logistic Regression[6] was used to evaluate the sensitivity, specificity, and classification accuracy of the optimal combinations. The research aims to determine the optimal marker combination and categorization algorithm by comparing the experimental results with all the possible combinations. Methods for the collection of data are illustrated in chapter 2, and the experimental details are demonstrated in chapter 3. The results of the marker combinations and its classification performance are discussed in chapter 4, and chapter 5 presents the conclusion and possible future researches. 163

2. Data collection For the experiment, the serum of 65 Koreans (27 Cancer, 38 Normal) provided from two hospitals was used. These samples were reacted with Luminex-beads attached with 21 biomarkers, and the florescence from the antibodies on the beads was measured. In order to equalize the range of the biomarker florescence, the florescence values of each biomarker were normalized to 0-1 based on their maximum and minimum values. The selected 21 biomarkers are the most commonly discussed markers in the modern cancer research [5], [6], [9]. These biomarkers are usually provided by UPCI Luminex Cor Facility [10]. 3. Methods This paper conducts two experiments: (1) determination of biomarkers with T-Test, Genetic Algorithm(GA), and Random Forest(RF), and (2) performance comparison of the selected markers using Linear Discriminant Analysis(LDA), k-nearest Neighbor(k-NN), and Logistic Regression(LR). The random tree creation for the RF was 50, the k for k-nn was 3, and the score threshold for the classification of LR was 0.5. The combination of the biomarkers consisted of 2-4 markers, and leave-one-out cross validation was conducted for the evaluation. The algorithms for the marker combination and combinations of evaluation algorithms used in the experiment are shown in fig. 2. As a control, the optimal marker combination gained through a complete enumeration survey of 2-4 markers was used. Fig. 2. The algorithms for the marker combination and combinations of evaluation algorithms 164

4. Results The experiment compares the difference in performance of the selected 2-4 multibiomarkers by T-Test, GA, and RF to that of the optimal combination amongst the total possible combinations of the markers. The sensitivity, specificity, and accuracy of each optimal combination for both cancer and benign group was measured and compared with LDA, k-nn, and LR. The markers that ought to be combined was limited to four, because of the high cost to combine of more than 4 markers will make it difficult to be realize and commercialize the use of multi-biomarkers. Also to avoid the infringement of patent, the names of the markers are concealed. 4.1 Optimum combination results according to classification algorithm Table 1 shows the optimal marker combination and their performance when applying LDA, k-nn, and LR to the combine the 2-4 markers. The best accuracy of 90.8% was seen in the 4-marker combinations when applying k-nn. Table 1. Performance test of the classification algorithm through all the possible marker combinations of 2-4 markers and the formation of the optimal combination Classification Algorithm Marker 1 Marker 2 Marker 3 Marker 4 Sensitivity Specificity Accuracy LDA k-nn LR M7 M19 0.852 0.868 0.862 M1 M10 M19 0.889 0.895 0.892 M1 M5 M10 M19 0.889 0.895 0.892 M1 M10 M14 M19 0.889 0.895 0.892 M7 M15 0.741 0.921 0.846 M11 M15 0.852 0.842 0.846 M15 M16 0.778 0.895 0.846 M7 M14 M19 0.852 0.921 0.892 M6 M7 M14 M19 0.889 0.921 0.908 M7 M15 0.741 0.921 0.846 M17 M19 0.778 0.895 0.846 M4 M15 M19 0.815 0.921 0.877 M7 M15 M19 0.852 0.895 0.877 M9 M10 M15 0.778 0.947 0.877 M6 M10 M15 M17 0.815 0.947 0.892 M6 M10 M15 M20 0.815 0.947 0.892 M8 M9 M10 M15 0.815 0.947 0.892 M9 M10 M15 M20 0.815 0.947 0.892 M10 M11 M15 M20 0.815 0.947 0.892 165

Table 2 compares accuracy of the selected markers through T-Test with all the combinations shown in Table 1. The 3-marker combination derived by Genetic Algorithm showed identical results with the optimal marker combination shown in Table 1. By comparing the rank with the optimum marker combination, all the algorithms except for LDA showed a similar level to that of the experiment in Table 2. This indicates that the most frequent marker combination is not sufficient enough to reproduce the maximum categorization accuracy. It also shows 13.9% point difference for the categorization accuracy. The four markers shown in Table 4 were selected with Random Forest, which includes the first, second, and fourth frequent markers shown in Table 2. Also the marker combinations differed greatly from the optimal marker combinations. The accuracy was up to 9.3% point low. Table 2. Classification performance comparison of the marker combinations obtained through T-Test Rank Accuracy Of Difference Classification Marker Marker Marker Marker Algorithm 1 2 3 4 Sens. Spec. Accu. Optimum with Marker the Optimum Combination marker Combination M9 M15 0.704 0.895 0.815 0.862 2 LDA M9 M15 M16 0.741 0.895 0.831 0.892 4 M9 M15 M16 M21 0.741 0.868 0.815 0.892 5 M9 M15 0.741 0.763 0.754 0.846 6 k-nn M9 M15 M16 0.741 0.789 0.769 0.892 8 M9 M15 M16 M21 0.704 0.895 0.815 0.908 6 M9 M15 0.704 0.895 0.815 0.846 2 LR M9 M15 M16 0.741 0.895 0.831 0.877 3 M9 M15 M16 M21 0.741 0.868 0.815 0.892 5 Table 3. Classification performance comparison of the marker combinations obtained through Genetic Algorithm Rank Accuracy Of Difference Classification Marker Marker Marker Marker Algorithm 1 2 3 4 Sens. Spec. Accu. Optimum with Marker the Optimum Combination marker Combination M10 M19 0.815 0.868 0.846 0.862 1 LDA M10 M1 M19 0.889 0.895 0.892 0.892 0 M15 M16 M13 M19 0.704 0.895 0.815 0.892 5 M10 M19 0.741 0.842 0.8 0.846 3 k-nn M10 M1 M19 0.778 0.842 0.815 0.892 5 M15 M16 M13 M19 0.741 0.868 0.815 0.908 6 166

LR M10 M19 0.556 0.868 0.738 0.846 7 M10 M1 M19 0.556 0.868 0.738 0.877 9 M15 M16 M13 M19 0.741 0.895 0.831 0.892 4 Table 4. Classification performance comparison of the marker combinations obtained through Random Forest Rank Accuracy of Difference Classification Marker Marker Marker Marker Algorithm 1 2 3 4 Sens. Spec. Accu. Optimum with the Marker Optimum Combination marker Combination M11 M7 0.519 0.947 0.769 0.862 6 LDA k-nn LR M7 M19 M9 0.704 0.895 0.815 0.892 5 M9 M19 M15 M7 0.778 0.895 0.846 0.892 4 M11 M7 0.704 0.816 0.769 0.846 5 M7 M19 M9 0.815 0.868 0.846 0.892 3 M9 M19 M15 M7 0.741 0.895 0.831 0.908 5 M11 M7 0.556 0.921 0.769 0.846 5 M7 M19 M9 0.667 0.868 0.785 0.877 6 M9 M19 M15 M7 0.778 0.895 0.846 0.892 3 4.2 Dot Plot and ROC curve for optimum marker combination As illustrated in Fig. 3, the cancer and benign are not distinguishable in one dimension. The results were easier to analyze when it was projected in two dimensions using a marker combination. The ROC curve of Fig. 4 demonstrates the aforementioned statement. Fig. 3. Dot Plot of individual markers in optimum marker combination 167

Fig. 4. ROC curve of individual markers. 5. Conclusion This paper searches for the biomarker combination that can easily distinguish the cancer to benign. Comparing the classification performance of ovarian cancer, selecting 2 or 3 markers through GA and classifying with LDA shows the most similar sensitivity, specificity, and accuracy to that of the marker combinations of 2-4 markers derived from the complete enumeration survey. However, except for the LDA 3-marker combination of GA, the marker combination of 2-4 markers obtained with the marker selection algorithm showed significantly low accuracy compared to the marker combinations obtained from the complete enumeration survey. In the future, we plan to develop an algorithm with better performance by optimizing the various parameters, fitness function, and generation function of the Genetic Algorithm. Also, by applying more various algorithms such as SVM(Support Vertor Machine) or ANN(Artificial Neural Network), we anticipate to discover an algorithm that will find a more similar marker combination of 2-4 markers to the combinations derived with complete enumeration survey. In addition, we will experiment on more various types of classification algorithms References 1. American Cancer Society, http://www.cancer.org/cancer/ovariancancer (2012) 168

2. Brian, N., Adele, M., Liudmila. V., Denise, P., Matthew, W., Elesier, G., Anna L.: A serum based analysis of ovarian epithelial tumorigenesis. ELSEVIER Gynecologic Oncology 112, 47 54 (2009) 3. ChiHeum, C.: Biomarkers related to Diagnosis and Prognosis of Ovarian Cancer. Koean Journal of Obstetrics and Gynecology 39, 90 95 (2008) 4. SunKyung, J., EunJi O., ChulWoo, Y., WoongSik, A., YongGu, K., YeonJun, P., KyungJa, H.: ELISA for the selection of HLA isoantibody and Comparison Evaluation of Luminex Panel Reactive Antibody Test. Journal of Korean Society for Laboratory Medicine 29, 473 480 (2009) 5. El-Awar, N., Lee, J., Terasaki. P.: HLA antibody identification with single antigen beads compared to conventional methods. Hum Immunol 66, 89 97 (2005) 6. David, F., Roger, P., Robert, P.: Statistics, 3rd Edition. W. W. Norton & Company (1998) 7. Tom, M. M.: Machine Learning. The McGraw-Hill (1997) 8. Leo, B.: Random Forest. Machine Learning 45, 5 32 (2001) 9. Suraj, D. A., Greg, P. B., Tzong-Hao C., Katharine, J. B., Jinghua, Z., Partha, S., Ping, Y., Brian, C. M.: Development and Preliminary Evaluation of a Multivariate Index Assay for Ovarian Cancer. PLoS ONE 4, e4599 (2009) 10. University of Pittsburgh Cancer Institute Luminex Core Facility, http://www.upci.upmc.edu/facilities/luminex/sources.html 169