New approaches for analyzing multi-channel image data and post-processing of phenotypic data Christian Klukas Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) LemnaTec GmbH, 52076 Aachen, Germany 2015 Sino-German Workshop on Multiscale spatial computational systems biology 9.10.2015
Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Large and internationally known plant research centre, working on problems in modern biology by focusing on cultivated plants IPK gene bank: > 3,000 botanic species Image analysis group (2010-2015) 2 14 Head of group: Dr. Christian Klukas Scientific assistants: Jean-Michel Pape, Michael Ulrich PhD-student: Dijun Chen Technician: Ingo Mücke
LemnaTec
My related previous & current work 2003-2009: Network-integrated data analysis and visualisation (member of group Plant Bioinformatics, Prof. Falk Schreiber): Pathways, Ontologies Metabolite data Protein data Gene expression data 2010-2015: Image-based phenotyping Image-analysis procedures have been developed (Integrated Analysis Platform, IAP) New data domain: phenotypic data
Network editing and data visualization < Expression Levels of Differentially Expressed Stage-Specfic cdnas Grouped by GO Classification. Sharbel T. F. et.al. Plant Cell 2010:22:655-671 Klukas, Junker, Schreiber (2006): The VANTED software system for transcriptomics, proteomics and metabolomics analysis. Journal of Pesticide Science, Vol. 31 (2006), No. 3, P 289-292 Junker, Klukas*, Schreiber (2006): VANTED: a system for advanced data analysis and visualization in the context of biological networks. BMC Bioinformatics, 7:109
Pathway editing, data visualization http://www.lipidmaps.org Dr. Christian Klukas, Image Analysis Group, IPK
Introduction: High-throughput phenotyping (HTP) in plants Important tool to measure plant growth, architecture Application in studying plant functions (i.e. water use) and performance (i.e. biomass accumulation) Can be used to accelerate research in plant biology (i.e. plant pathogens) applied plant sciences (i.e. breeding, biotechnology, agriculture, horticulture) ecology (i.e. biodiversity, climate change research)
HTP in Plants: Goals Quantitative rather than qualitative measurements (i.e. for disease states) Comprehensive measurements Extensive phenotyping: gathering more information at the same time Intensive phenotyping: characterizing the phenotype in great detail (e.g. population wide, time lapse) Identifying the genetic basis of complex traits (i.e. yield, stress resistance, )
Challenges: big data handling and analysis Automation: high throughput vs low costs Big data problems Data storage and management Image analysis (computing time and resources) Result interpretation Data analytical challenges many-to-many relations (e.g. G-P maps) high dimensions, small samples ~ over fitting integration of data from other domains into analysis
HTP Infrastructure: Hard- & Software Soft infrastructure Plant organ: shoot / root / seed Treatments: abiotic / biotic stress Experiment type: mapping / mutant populations Species: Arabidopsis, barley, rice, wheat, tobacco, Hard infrastructure Field / greenhouse Robotics, conveyer systems, watering, fertilization regimes Cameras: visible-light, fluorescence, near-infrared, infrared, hyper-spectral Sensors and monitoring
Phenotyping Infrastructure at IPK Phytochamber (384 carriers, up to 4608 small plants)
Phenotyping Infrastructure at IPK Automated greenhouse (520 carriers)
Phenotyping Infrastructure at IPK large automated green house (~396 carriers, up to 1584 plants)
Camera Systems LemnaTec imaging hardware (with rotating top & side views) installed at three locations - Fluorescence (~ GFP, chlorophyll) Blue light illumination (400-500 nm) Capturing filtered light (520-750 nm) - Near-Infrared (~ water distribution) Capturing near-infrared light (1450-1550 nm) - Visible light (~ general phenotyping) Capturing visible light (390-750 nm) - Infrared (~ temperature) Placed inside phytochamber
Phenotyping Software IAP software for digital plant phenotyping Christian Klukas, Dijun Chen, and Jean-Michel Pape: Integrated Analysis Platform: An Open-Source Information System for High-Throughput Plant Phenotyping. Plant Physiol. 2014 165:506-518. URL: http://iap.ipk-gatersleben.de 15 14
Post-processing of data Growth modelling, statistical evaluation 16 14
Example 1: Differences in leaf-morphology Genotype 1 Day 45 Genotype 2 Genotype 3 Genotype 4 Genotype 5
140000000 105000000 px³ 70000000 35000000 0 days after sowing
border length ² / area 3000 low compactness: longer border and/or smaller covered area 2250 1500 750 high compactness: shorter border and/or larger covered area 0 10141516171820212223242526272829303132333536373839404142434445464849505156 days after sowing
Example 2: Tobacco flower and leaf movement
Hyperspectral- and multi-channel data Multi Channel Classification and Clustering System (MCCCS) (Image based) - hyperspectral, multi-channel data processing - command-line based (bash) - supports ML libaries (WEKA, R) - image processing (Java) 21 14
Image-based Multi Channel Classification and Clustering
LSC 2014 Segmentation based on 3Dhistograms Leaf detection by calculation leaf center points using EDM Leaf split point and leaf split line calculation for individual leaf segmentation [1] Jean-Michel Pape and Christian Klukas: 3-D histogrambased segmentation and leaf detection for rosette plants. European Conference on Computer Vision (ECCV) 2014, Workshops. Springer International Publishing, S. 61-74, 2015. 23 14
Open Challenges 1. Leaf overlaps 2. Small leaves in center 24 14 3. Bad illumination, straight split lines
Overview LSC/LCC 2015 Leaf counting prediction Leaf segmentation approach including leaf overlap prediction Utilizing Machine Learning Supervised approach 25 14
Leaf Counting Image feature extraction using IAP software Evaluation of different regression approaches (WEKA) (10-fold cross of the training data and a 10 times repeated testing) Prediction for testing data Results are used for leaf segmentation 26 14
Leaf Counting classifier evaluation (feature selection) 27 14
Leaf Segmentation Leaf overlap detection learn leaf borders using MCCCS and WEKA Individual leaf labelling use detected borders to split leaves stepwise merging until target leaf count is reached 28 14
Leaf border detection Mask extraction from training data Classifier training Prediction of border images 29 14
RGB
L*a*b* - L
L*a*b* - a
Top 5 Features for Border-Prediction Border Border region Ranked attributes: 1: Lab_a_mean 2: Luv_v_asm 3: Lab_a_glcm_mean_v 4: Luv_v_glcm_homogeneity_v 5: Lab_a_glcm_mean_h Inner Leaf
Correctly classified (%) Border Predictions 100 75 50 25 0 1 5 all (59) Feature count
Thank you for your attention! Related Literature Group Image Analysis Jean-Michel Pape (DPPN WP 1.3.4, 1.3.12) Dijun Chen (EPPN, OPTIMAL) Michael Ulrich Ingo Mücke Data Inspection Funding Swetlana Friedel (now at BASF) Group Genome Diversity Benjamin Kilian (now at Bayer) Kerstin Neumann Zhejiang University Ming Chen C. Klukas Klukas, C., Chen, D., Pape, J.M.: Integrated analysis platform: An open-source information system for high-throughput plant phenotyping. Plant physiology 165(2), 506-518 (2014) Pape, J.M. and Klukas, C.: 3-D histogram-based segmentation and leaf detection for rosette plants. In: Computer Vision - ECCV 2014. Workshops and Demonstrations - European Conference on Computer Vision. Springer (2015). D. Chen... C. Klukas: Dissecting the Phenotypic Components of Crop Plant Growth and Drought Responses Based on HighThroughput Image Analysis. Plant Cell (2014)