Transcriptome-based distance measures for grouping of germplasm and prediction of hybrid performance in maize

Similar documents
Surface Water Hydrology

A biomechanical model for the study of plant morphogenesis: Coleocheate orbicularis, a 2D study species.

Combining ability analysis for yield and quality traits in indigenous aromatic rice

Solar in Wetlands. Photo credit: a k e.org/blog/2012/08/15mw solar field near philadelphia.html

OPTIMIZATION OF FILLER METALS CONSUMPTION IN THE PRODUCTION OF WELDED STEEL STRUCTURES

Production Policies of Perishable Product and Raw Materials

AN IDEA BASED ON HONEY BEE SWARM FOR NUMERICAL OPTIMIZATION (TECHNICAL REPORT-TR06, OCTOBER, 2005) Dervis KARABOGA

CONICAL PIPE ENVELOPE FORMATION PROCESS

DESIGN OF OPTIMAL WATER DISTRIBUTION SYSTEMS

Quick Reference: Amplifier Equations

Investigation of a Dual-Bed Autothermal Reforming of Methane for Hydrogen Production

Quantifying the First-Flush Phenomenon: Effects of First-Flush on Water Yield and Quality

Common up Regulated and down regulated Genes for Multiple Cancers using Microarray Gene Expression Analysis

Fuzzy evaluation to parkour social value research based on AHP improved model

An Approach to Classify the Risk of Operating Nuclear Power Plants Case Study: Neckarwestheim Unit 1 and Unit 2

Detection of allele-specific methylation through a generalized heterogeneous epigenome model

The effect of hitch-hiking on genes linked to a balanced polymorphism in a subdivided population

Analysis of the Internal Pressure in Tube Hydroforming and Its Experimental Investigation

TRAINING NEEDS ANALYSIS and NATIONAL TRAINING STRATEGIES

Progress towards Modeling Red Tides and Algal Blooms

Quantitative Models to Study the Soil Porosity

SANITARY ENGINEERING ASSISTANT, 7866 SANITARY ENGINEERING ASSOCIATE, 7870 SANITARY ENGINEER, 7872

Theoretical Investigation on Condensing Characteristics of Air and Oil Vapor Mixtures

COMPUTER MODELLING AND FINITE ELEMENT ANALYSIS OF TUBE FORMING OPERATIONS Dr.S.Shamasundar, Manu Mathai, Sachin B M

Managing Accounting Information Quality: An Australian Study

Pass-Through and Consumer Search: An Empirical Analysis. by Timothy J. Richards, Miguel I Gómez and Jun Lee

Global Energy Trade Flows and Constraints on Conventional and Renewable Energies A Computable Modeling Approach

Adjoint Modeling to Quantify Stream Flow Changes Due to Aquifer Pumping

Improving Software Effort Estimation Using Neuro-Fuzzy Model with SEER-SEM

GenomeLab GeXP. Troubleshooting Guide. A53995AC December 2009

Self-assessment for the SEPA-compliance of infrastructures

Time of Day Tariff Structure

Customer Portfolio Analysis Using the SOM

Learning and Technology Spillover: Productivity Convergence in Norwegian Salmon Aquaculture

A two-level discount model for coordinating a decentralized supply chain considering stochastic price-sensitive demand

Two-tier Spatial Modeling of Base Stations in Cellular Networks

Numerical Simulation of Transient 3-D Surface Deformation of a Completely Penetrated GTA Weld

PcBn for cast iron Machining

Demulsification of Water-in-Oil Emulsions by Microwave Heating Technology

DISPLACEMENT-BASED DESIGN OF CONCRETE TILT-UP FRAMES ACCOUNTING FOR FLEXIBLE DIAPHRAGMS

PROGRAMA BIOEN Projeto 2008/ Simulating Land Use and Agriculture Expansion in Brazil: Food, Energy, Agro industrial and Environmental Impacts

Application of Induction Machine in Wind Power Generation System

Social Rewarding in Wiki Systems Motivating the Community

Assessing Emission Allocation in Europe: An Interactive Simulation Approach

Evaluating the Effectiveness of a Balanced Scorecard System Implemented in a Functional Organization

Lecture 3 Activated sludge and lagoons

Quantitative [3-glucuronidase assay in transgenic plants

CONE PERMEAMETER IN-SITU PERMEABILITY MEASUREMENTS WITH DIRECT PUSH TECHNIQUES

SURFACE TENSION OF LIQUID MARBLES, AN EXPERIMENTAL APPROACH

A Model for Dissolution of Lime in Steelmaking Slags

Evolving Large Scale UAV Communication System

How To Grow Bionically vs.

Environmental Externalities in the Presence of Network Effects: Adoption of Low Emission Technologies in the Automobile Market

Coal ash ponds: Could they contribute to Alzheimer s disease risk in residential populations?

Occurrence and spatial pattern of water repellency in a beech forest subsoil

MIAMI-DADE COUNTY PRODUCT CONTROL SECTION DEPARTMENT OF REGULATORY AND ECONOMIC RESOURCES (RER)

DEW POINT OF THE FLUE GAS OF BOILERS CO-FIRING

Competitive Analytics of Multi-channel Advertising and Consumer Inertia

Optimum Design of Pipe Bending Based on High- Frequency Induction Heating Using Dynamic Reverse Moment

ABSTRACT INTRODUCTION

Theoretical model and experimental investigation of current density boundary condition for welding arc study

KINEMATICS OF RIGID BODIES. y Copyright 1997 by The McGraw-Hill Companies, Inc. All rights reserved. KINEMATICS OF RIGID BODIES

One-to-one Marketing on the Internet

CREE. How do investments in heat pumps affect household energy consumption? Bente Halvorsen and Bodil Merethe Larsen. Working Paper 6/2013

Arch. Min. Sci., Vol. 61 (2016), No 4, p

of the North American Automotive Industry VOLUME 3: MATERIALS June, 1998 Published by

SPONSORSHIP OPPORTUNITIES

Single nucleotide polymorphisms in rye (Secale cereale L.): discovery, frequency, and applications for genome mapping and diversity studies

Super Precision Bearings for Machine Tools

Journal of Retail Analytics

Springback Simulation with Complex Hardening Material Models

Original Research Bioavailability of Lead, Cadmium, and Nickel in Tatra Mountain National Park Soil

Experimental Evaluation of the Energy Performance of an Air Vortex Tube when the Inlet Parameters are Varied

FACTORS INFLUENCING ENERGY CONSUMPTION IN FRUIT AND VEGETABLE PROCESSING PLANTS. Janusz Wojdalski, Bogdan DróŜdŜ, Michał Lubach

L. Carbognin and P. Gatto Istituto Studio Dinamica Grandi Masse - CNR, Venice, Italy.

Development projects, migration and malaria in the GMS

HOBAS NC Line. Make things happen.

DEFECT ASSESSMENT ON PIPE USED FOR TRANSPORT OF MIXTURE OF HYDROGEN AND NATURAL GAS

Household Energy Use and Carbon Emissions in China: A Decomposition Analysis

Lectures on: Introduction to and fundamentals of discrete dislocations and dislocation dynamics. Theoretical concepts and computational methods

SIMULATION OF NATURAL GAS FLUIDIZED BED USING COMPUTATIONAL FLUID DYNAMICS

Ground-Water Contamination

Citation Zeitschrift für Metallkunde. 92(11)

AN ABSTRACT OF THE THESIS OF. The Phytoplankton and Limnological Characteristics. Harry'. Phinney

Self-organization approach for THz polaritonic metamaterials

DETERMINATION OF TOTAL AVAILABLE MOISTURE IN Sol LS*

A Misranking/Masquerading-Proof Mechanism for Online Reputation System

Optimal Policies for Perishable Items when Demand Depends on Freshness of Displayed Stock and Selling Price

MSEC_ICM&P ESTIMATION OF TEMPERATURE DISTRIBUTION IN SILICON DURING MICRO LASER ASSISTED MACHINING

Hybrid Model of Existing Buildings for Transient Thermal Performance Estimation

Introduction. Keywords: bamboo, dynamic nanoindentation, heat treatment, quasi-static nanoindentation

Residual shear strength of clay-structure interfaces

The Brazilian ethanol industry

Report of significant findings--las Vegas Bay/ Boulder Basin investigations

E T HIGH PERFORMANCE MULTI-MATERIAL MILLING. The Mastermill VX range: Exceptional performance and reliability. UROPA OOL

On the Degeneracy of the Water/Wastewater Allocation Problem in Process Plants

DRAFT Traffic Operations & Safety Evaluation Technical Memorandum

A brief history of the Indian iron and steel industry

Transcription:

Theo Appl Genet (2010) 120:441 450 DOI 10.1007/s00122-009-1204-1 ORIGINAL PAPER Tansciptome-based distance measues fo gouping of gemplasm and pediction of hybid pefomance in maize Matthias Fisch Alexande Thiemann Junjie Fu Tobias A. Schag Stefan Scholten Albecht E. Melchinge Received: 7 May 2009 / Accepted: 21 Octobe 2009 / Published online: 13 Novembe 2009 Ó Spinge-Velag 2009 Abstact Gouping of gemplasm and pediction of hybid pefomance and heteosis ae impotant applications in hybid beeding pogams. Gene expession analysis is a pomising tool to achieve both tasks efficiently. Ou objectives wee to (1) investigate distance measues based on tansciption pofiles, (2) compae these with genetic distances based on AFLP makes, and (3) assess the suitability of tansciptome-based distances fo gouping of gemplasm and pediction of hybid pefomance and heteosis in maize. We analyzed tansciption pofiles fom seedlings of the 21 paental maize lines of a 7 9 14 factoial with a 46-k oligonucleotide aay. The hybid pefomance and heteosis of the 98 hybids wee assessed in field tials. In cluste and pincipal coodinate analyses fo gemplasm gouping, the tansciptome-based distances wee as poweful as the genetic distances fo sepaating flint fom dent inbeds. Coss validation showed that pediction of hybid pefomance with tansciptome-based distances using selected makes was moe pecise than Communicated by A. Chacosset. Contibution to the special issue eteosis in Plants. M. Fisch (&) Institute of Agonomy and Plant Beeding II, Justus-Liebig-Univesity, 35392 Giessen, Gemany e-mail: matthias.fisch@aga.uni-giessen.de J. Fu T. A. Schag A. E. Melchinge Institute of Plant Beeding, Seed Science, and Population Genetics, Univesity of ohenheim, 70593 Stuttgat, Gemany A. Thiemann S. Scholten Biocente Klein Flottbek, Developmental Biology and Biotechnology, Univesity of ambug, Ohnhoststasse 18, 22609 ambug, Gemany ealie pediction models using DNA makes o geneal combining ability estimates using field data. Ou esults suggest that tansciptome-based pediction of hybid pefomance and heteosis has a geat potential to impove the efficiency of maize hybid beeding pogams. Intoduction The pediction of hybid pefomance using infomation fom paental inbed lines is of geat inteest to beedes. If successful, it can incease substantially the efficiency of beeding pogams. Pediction methods using the genetic distance between the paental lines failed to pedict eliably the hybid pefomance of inte-goup hybids in plant beeding pogams (Melchinge 1999). In contast, pediction methods using makes linked to quantitative tait loci affecting the tait unde consideation wee successfully developed (Vulysteke et al. 2000; Schag et al. 2006, 2007, 2009a, b). In hybid beeding, the gemplasm is usually divided into genetically distant heteotic pools. Molecula makebased genetic distances, gaphically displayed by multivaiate statistical methods, such as pincipal coodinate and cluste analyses can be helpful to accomplish this task (Reif et al. 2003, 2005). With the advent of tansciptome analysis by gene expession pofiling, a new lab technology has emeged. It can be employed in studying the molecula basis of heteosis (Bichele et al. 2003). Stupa et al. (2008) found a coelation of genetic divesity and tansciptional vaiation. Guo et al. (2006) suggested that diffeential allele egulation may play an impotant ole fo heteosis. Spinge and Stupa (2007) suggested that modified levels of gene expession in hybids may contibute to heteotic

442 Theo Appl Genet (2010) 120:441 450 phenotypes. The applicability of expession pofiles, SNP makes, and metabolites fo pediction of hybid pefomance and heteosis was ecently investigated with vaious appoaches (Maenhout et al. 2009; Repsilbe et al. 2009; Steinfath et al. 2009). A pimay focus of inteest is to use gene expession data in functional analyses fo detection of genes undelying agonomic taits (Thiemann et al. 2009). An altenative view on gene expession data is possible by disegading all functional infomation on the analyzed genes and consideing the tanscipt abundance levels as quantitative vaiables chaacteizing a genotype. These quantitative vaiables could then be used to constuct distance measues between genotypes on the basis of thei tansciption pofiles. In combination with multivaiate methods, the distances could be employed fo gouping of gemplasm and in combination with linea models fo pediction of hybid pefomance and heteosis. To ou knowledge, no pevious investigation on tansciptomebased distance measues and thei applications is available. The goal of ou study was to investigate the potential application of tansciptome-based distance measues in maize hybid beeding pogams. In paticula, ou objectives wee to (a) investigate distance measues between inbed lines based on gene expession pofiles, (b) examine thei coelation with molecula make-based genetic distances, and (c) assess the suitability of tansciptomebased distances fo gemplasm gouping and pediction of heteosis and hybid pefomance. Mateials and methods Field data Seven flint and 14 dent elite inbed lines developed by the maize beeding pogam of the Univesity of ohenheim wee used as paental lines fo a 7 9 14 factoial mating design. The inbeds compised eight dent lines with Iowa Stiff Stalk Synthetic (S028, S036, S044, S046, S049, S050, S058, S067) and six with Iodent backgound (P033, P040, P046, P048, P063, P066). Fou flint lines (F037, F039, F043, F047) had an Euopean Flint and thee (L024, L035, L043) a Flint/Lancaste backgound. The factoial cosses wee evaluated in 2002 at six agoecologically divese locations in Gemany (Bad Kozingen, Eckatsweie, ohenheim, Landau, Sünching, Vechta). The 21 inbed paents wee evaluated fo thei pe se pefomance in 2003 at fou locations (Eckatsweie, ohenheim, Sünching, Pocking) and in 2004 at thee locations (Eckatsweie, ohenheim, Bad Kozingen). The tials wee evaluated in two-ow plots using adjacent a designs (genealized lattices) with two to thee eplications. Gain yield (Mg ha -1 ) adjusted to 155 g kg -1, gain moistue, and gain dy matte concentation (%) fo the inbed paents and factoial cosses wee ecoded. The data wee analyzed with a mixed linea model as descibed in detail by Schag et al. (2009a, b). The factoial set of cosses investigated hee is one of the nine factoials analyzed by Schag et al. (2009a, b) and was also included in the studies of Schag et al. (2006, 2007), whee it was efeed to as Expeiment 1. Molecula make data The inbed lines wee assayed fo AFLP makes with 20 pime combinations as descibed in detail by Schag et al. (2006). The AFLP analyses esulted in 1,835 makes. The genetic distance D A between inbed lines i and j was calculated fom the banding patten of n m AFLP bands as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 X nm D A ði; jþ ¼ ½b m ðiþ b m ðjþš 2 ; ð1þ n m m¼1 whee b m (i) and b m (j) ae indicato vaiables taking the value one, if band m was obseved in inbed line i o j, espectively, and zeo othewise. D A (i, j) is elated to the single matching coefficient SM(i, j) p D A ði; jþ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 SMði; jþ: ð2þ D A has the popety of being Euclidean and, theefoe, is well suited fo pincipal coodinate and cluste analyses. Gene expession data Five plants of each of the 21 inbed lines wee gown fo 7 days in a climate chambe unde egulated gowth conditions. The five biological eplicates wee pooled (Kendzioski et al. 2005) and RNA was isolated fom a mixtue of the five seedlings. The 46-k aay fom the maize oligonucleotide aay poject (http://www.maizeaay.og/, Univesity of Aizona, USA) with 43381 gene-oiented 70-me maize oligo-spots (in total 46,128 featues) pinted on a glass-slide was used fo hybidization analyses (Thiemann et al. 2009). Fo the micoaay analysis we employed an intevoven loop design (Ke and Chuchill 2001) esulting in 62 diect compaisons of dent and flint lines by sampling each dent line five times and each flint line eight times. Diffeences in the gene expession wee tested with a modified F test using a false discovey ate of 0.01 fo all genes showing a fold change of at least 1.3 and expession level (log2) of at least 8.0. All genes, which wee diffeentially expessed in at least one pai of paental lines of the 98 factoial cosses, wee assigned to the subset of genes S p.

Theo Appl Genet (2010) 120:441 450 443 Tansciptome-based distance measues A Euclidean distance between lines i and j can be detemined fom the gene expession data of n g genes as vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X n g u ði; jþ ¼t 2 l g ðiþ l g ðjþ ; ð3þ g¼1 whee l g (i) and l g (j) ae the base-two logaithms of the tanscipt abundance of gene g in inbed lines i and j. A binay distance between lines i and j can be detemined fom the gene expession data as vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 1 X ng ði; jþ ¼t 2 x g ðiþ x g ðjþ ð4þ n g g¼1 whee x g (i) and x g (j) ae indicato vaiables taking the values zeo and one depending on diffeential gene expession of gene g in inbed lines i and j. If gene g is diffeentially expessed in lines i and j, then x g ðiþ ¼1 and x g ðjþ ¼0 fo l g ðiþ [ l g ðjþ; and x g ðiþ ¼0 and x g ðjþ ¼1 fo l g ðiþl g ðjþ: If gene g is not diffeentially expessed, then x g (i) = x g (j) = 0. Equation 4 simplifies to qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ði; jþ ¼ n s ði; jþ=n g ; ð5þ whee n s (i, j) is the numbe of genes diffeentially expessed in lines i and j. The tansciptome-based distances and wee detemined fo the subset of genes S p. The coelation of and with D A and the coelation of D A,, and with hybid pefomance and heteosis was detemined. Futhemoe, the distances D A,, and wee subjected to a pincipal coodinate analysis and a hieachical cluste analysis using the complete linkage clusteing algoithm implemented in the hclust function of statistical softwae R (Ihaka and Gentleman 1996). Association of diffeential gene expession with hybid pefomance and heteosis To identify genes, which wee diffeentially expessed in the paents of hybids with high pefomance, the hybids ae divided in two classes T and L of equal size. The class T consists of hybids with high and the class L of hybids with low hybid pefomance : 8 ði; jþ 2T; ðk; lþ 2L : ði; jþðk; lþ: ð6þ Conside gene g and let o gt and o gl denote the numbes of hybids in class T and L, espectively, fo which the paents show diffeential expession of gene g. Let o P g ¼ X gtþo gl k¼o gt Bin n;p ðkþ ð7þ whee Bin n,p is the pobability function of the binomial distibution with paametes n = o gt? o gl and p = 1/2. P g denotes the pobability that the count o gt, o a lage one, is obseved unde the condition that diffeential gene expession occus with the same pobability in the hybids showing a high and those showing a low hybid pefomance. ence, small values fo P g indicate an association of diffeential expession with high hybid pefomance. A subset of genes, the diffeential expession of which is associated with mid paent heteosis can be detemined in an analogous manne. On the basis of all 98 hybids of the factoial, we detemined the subsets of genes associated with hybid pefomance S y and heteosis S h. To accomplish this, we compaed the pobabilities P g with theshold values detemined with the Bonfeoni-olm pocedue fo a Type-I eo of a = 0.01. The distances and wee calculated fo the subsets of genes S y and S h and the coelations of these distances with hybid pefomance and heteosis, espectively, wee detemined. Tansciptome based pediction of hybid pefomance and heteosis Fo pediction of new hybids, a efeence set of elated beeding mateial (estimation set) is equied. The estimation set consists of the expession pofiles of paental inbed lines u, v and the field data of thei factoial cosses (u, v). Fom these data, the set of genes S is detemined, of which diffeential expession is associated with hybid pefomance. The (Euclidean o binay) distances D S (u, v) on basis of S between the paental lines ae detemined. The distances between paents and the pefomance data of the hybids in the estimation set ae used to estimate the egession paametes b 0 and b 1 with the linea egession model ðu; vþ ¼b 0 þ b 1 D S ðu; vþ: ð8þ To pedict the pefomance (i, j) of a new hybid, the gene expession pofiles of the paental lines i and j ae assessed. The set of genes S is used to detemine the (binay o Euclidean) distance D S (i, j) between the paental lines. Fom the distance D S (i, j) and the egession paametes b 0 and b 1 the pefomance (i, j) of the new hybid is pedicted with Eq. 8. The pediction pocedue is summaized in Fig. 1. Mid-paent heteosis (i, j) is pedicted in an analogous manne. Assessment of pediction efficiency The pediction efficiency was evaluated with a coss-validation pocedue in which we divided ou data in an

444 Theo Appl Genet (2010) 120:441 450 Estimation set Lines: u,v ybids: (u,v) New hybids to be pedicted Lines: i,j ybids: (i,j ) Detemine the set of genes S associated with hybid pefomance (u,v) Estimate egession paametes 0 and 1 fom (u,v) = 0 + 1 D S (u,v) Fig. 1 Pediction of hybid pefomance Estimate tansciptome-based distance D S (u,v) between paental lines using the set of genes S Estimate tansciptome-based distances D S (i,j ) between paental lines using the set of genes S Pedict hybid pefomance (i,j) = 0 + 1 D S (i,j ) estimation set, used fo estimation of pediction paametes, and a validation set fo which pediction was caied out. The estimation set consisted of the tansciptome data of five andomly chosen dent and thee andomly chosen flint lines and the field data of thei hybids. The validation set consisted of the emaining inbeds and hybids of the factoial. The subsets of genes associated with hybid pefomance and heteosis wee detemined in the estimation set by compaing the binomial pobability P g with a theshold of a = 0.05 (employing no adjustment fo multiple testing). With these distances, the egession paametes b 0 and b 1 wee detemined in the estimation set (Eq. 8). Employing S, b 0, and b 1, the hybid pefomance and heteosis of the hybids in the validation set wee pedicted. The coelation coefficient of obseved with pedicted values of (i, j) and (i, j) was assessed in 100 coss validation ounds. It was employed as a measue to assess the pediction efficiency. In this coss validation scheme, less than half of the lines ae used to estimate the pediction paametes and moe than half of the lines fo validation. Theefoe, it models the eality of maize beeding pogams much bette than the olde appoach of a one-leave-out coss validation (Vuylsteke et al. 2000; Schag et al. 2006). The efficiency of the tansciptome-based pediction was compaed with that of pediction based on the geneal combining ability (GCA) estimated fom field tials and that of pediction with the single-make total effects of associated makes (SM-TEAM) appoach of Schag et al. (2007). Results The mean gain yield of the 98 hybids was 11.72 Mg ha -1 with a boad sense heitability of 80.3%. The GCA and SCA vaiance components, as well as thei inteactions with the locations wee significantly diffeent fom zeo (a = 0.05). The atio of SCA:GCA vaiance components was 1.12. Theefoe, the GCA is expected to explain only patially the vaiation in hybid pefomance in this factoial. The esults wee pesented in detail by Schag et al. (2006). The subset S p of genes, which wee diffeentially expessed in at least one pai of paental lines of the factoial cosses consisted of 10,810 genes. The anges of the distances between lines fom the same heteotic pool (intapool distances) wee 0.22 B D A B 0.57, 0.16 B B 0.53, and 25.7 B B 76.5. The anges of the distances between lines fom opposite heteotic pools (inte-pool distances) wee 0.56 B D A B 0.61, 0.39 B B 0.69, and 38.6 B B 89.0. The genetic distance D A was stongly coelated with the tansciptome-based distances and fo inta-pool cosses, but only loosely fo inte-pool cosses (Fig. 2). The fist pincipal coodinate clealy sepaated the flint fom the dent lines fo all thee distance measues (Fig. 3). The only exception was flint line L024, which had a fist pincipal coodinate close to the dent than to the flint pool fo distance. The second pincipal coodinate sepaated the dent lines with Stiff Stalk fom those with Iodent backgound (Fig. 3). Cluste analyses based on the genetic distance D A and the binay distance esulted in sepaate clustes fo the flint and dent lines (Fig. 3). The binay distance even sepaated sub-clustes of flint lines having Euopean flint and Flint/Lancaste backgound. The Euclidean distances did not sepaate the flint and dent lines. A clea sepaation between the diffeent dent backgounds (Stiff Stalk vs. Iodent) was not obseved fo any of the distances. The genetic distance D A between paental lines was neithe significantly coelated with hybid pefomance no with heteosis fo gain yield (Fig. 4). In contast, the distances and, detemined fom the subset S p of diffeentially expessed genes, wee coelated with hybid pefomance and heteosis. The subset of genes S y whose diffeential expession was associated with hybid pefomance consisted of 1,424 genes, and the subset S h of genes associated with heteosis of 1,763 genes. The distances and, detemined fom the subsets of genes S y and S h wee stongly coelated with hybid pefomance and heteosis fo gain yield (Fig. 5). In coss validation, the coelation of obseved with pedicted values fo hybid pefomance and heteosis was

Theo Appl Genet (2010) 120:441 450 445 Fig. 2 Coelation of the genetic distance D A with the tansciptome-based distances and. The distances and wee detemined fom the subset of genes S p, compising 10,810 diffeentially expessed genes. *P B 0.05, ***P B 0.001 0.2 0.3 0.4 0.5 0.6 0.7 30 40 50 60 70 80 90 0.2 0.3 0.4 0.5 0.6 D A 0.2 0.3 0.4 0.5 0.6 D A geate fo pediction with the distances and than fo pediction with the ealie pediction methods GCA and SM-TEAM (Fig. 6). Pediction with the binay distance esulted in stonge coelations than pediction with the Euclidean distance fo both, hybid pefomance and heteosis. Fo pediction of gain yield with the binay distance, the coelations of obseved with pedicted values had the smallest anges and inte-quatile distances if 1,000 1,500 genes wee selected fo pediction (Fig. 6). Fo fewe than 1,000 genes, the anges and inte-quatile distances inceased and fo moe than 1,500 genes the median deceased. Discussion Tansciptome analysis in the seedling stage The RNA fo expession pofiling was extacted fom entie seedlings 7 days afte sowing. Fom a biological point of view, this aises the question whethe thee ae justifications fo the assumption that the tanscipt levels in seedlings ae elated to agonomic pefomance. Thee may be specific development stages and specific tissues, in which gene expession is functionally moe closely elated to gain yield, gain dy matte concentation, o othe impotant taits. Fom a pactical point of view, expession pofiling in the seedling stage has the big advantage that the data can be geneated quickly and with limited esouces, compaed with gowing plants fo a longe time peiod and analyzing specific tissues. It emains open to futhe investigations, whethe a possible gain in infomation content of the tansciptome data at late development stages may outweigh this time advantage in beeding pogams, in paticula because establishing a new vaiety one season ealie in the maket can be the key to its economic success. Theoetical popeties of the tansciptome-based distances The distance has the desiable popety of being Euclidean and, hence, fom a mathematical point of view, is suitable fo a boad ange of multivaiate methods. Fom a genetical point of view, it has the shotcoming that genes with a big diffeence in the tanscipt abundance influence moe the final value of the distance than genes with a smalle diffeence. Tansfomations, such as the base-two logaithmic tansfomation employed in this study, educe the numeical effect. oweve, genetical o physiological models justifying such tansfomations ae lacking. The binay distance discads the quantitative infomation on the tansciption levels and assigns equal effects to all diffeentially expessed genes. This coesponds to a quantitative genetic model with many genes having small effects of simila size. Employing fo pediction of heteosis with the linea egession model of Eq. 8 coesponds well to the hypothesis, that small dominance effects at a lage numbe of loci ae esulting in the heteosis obseved in a hybid. Coelation of the genetic distances with the tansciptome-based distances The tansciptome-based distances showed a significant coelation of about 0.7 with the genetic distance fo intapool cosses, wheeas fo inte-pool cosses only a loose coelation of about 0.2 was obseved (Fig. 2). The high coelation of the tansciptome-based distances with the genetic distance fo inta-pool cosses suppots the hypothesis that within one heteotic pool both measues contain to a lage pat simila infomation. Fo inte-pool cosses, the low coelation suggests that the infomation content of the tansciptome-based and genetic distances diffes. The ange of inte-pool genetic distance was consideably smalle than the ange of inte-pool tansciptome-based distances. This

446 Theo Appl Genet (2010) 120:441 450 Fig. 3 Pinciple coodinate analyses and hieachical cluste analyses based on the genetic distance D A and the tansciptome-based distances and. The distances and wee detemined fom the subset of genes S p compising 10,810 diffeentially expessed genes PCo 2 (15.1%) 0.3 0.2 0.1 0.0 0.1 0.2 0.3 D A 0.2 0.3 0.4 0.5 0.6 L024 L043 L035 F037 F047 F039 F043 S028 S046 S049 S044 S036 S050 S058 P063 S067 P066 P033 P048 P040 P046 0.3 0.2 0.1 0.0 0.1 0.2 0.3 PCo 1 (21.4%) PCo 2 (11.3%) 0.3 0.2 0.1 0.0 0.1 0.2 0.3 F/Lanceste F/Euopean D/Iodent D/Stiffstalk 0.1 0.2 0.3 0.4 0.5 0.6 0.7 F047 F037 F039 F043 L035 L024 L043 S067 S050 S058 P063 P048 P033 P066 S028 S036 S046 S049 P040 P046 S044 0.3 0.2 0.1 0.0 0.1 0.2 0.3 PCo 1 (32.0%) PCo 2 (14.5%) 40 20 0 20 40 20 40 60 80 L043 F047 F037 F039 F043 L035 S036 S028 S046 S049 P040 P046 S044 S050 S058 S067 P063 P066 P048 L024 P033 40 20 0 20 40 PCo 1 (23.0%) suppots the hypothesis that the inte-pool tansciptomebased distances cay moe infomation than the inte-pool genetic distances. The tansciptome-based distances ae diectly quantifying the expession of genes, which may be esponsible fo the phenotype and do not ely on the linkage between makes and genes. In consequence, tansciptome data should be pefeable to make data when diffeent heteotic pools ae consideed. Gemplasm gouping The multivaiate analyses employing the binay distance gouped the flint and dent pools as clealy as did the genetic distance D A. In addition, cluste analysis with sepaated two subgoups within the flint pool (Fig. 3). In pincipal coodinate analysis on basis of, the flint line L024 was close to the dent lines, and in the cluste analyses

Theo Appl Genet (2010) 120:441 450 447 Fig. 4 Coelation of hybid pefomance (left hand side) and mid-paent heteosis (ight-hand side) fo gain yield with the genetic distance D A (top), binay distance (cente), and Euclidean distance (bottom). The distances and wee detemined fom the subset of genes S P compising 10,810 diffeentially expessed genes. ns: P [ 0.05, ***P B 0.001 100 105 110 115 = 0.13 ns 0.56 0.58 0.60 0.62 40 45 50 55 60 65 = 0.06 ns 0.56 0.58 0.60 0.62 D A D A 100 105 110 115 = 0.36*** 40 45 50 55 60 65 = 0.46*** 0.40 0.50 0.60 0.70 0.40 0.50 0.60 0.70 100 105 110 115 = 0.42*** 40 45 50 55 60 65 = 0.51*** 40 50 60 70 80 90 40 50 60 70 80 90 no sepaate clustes fo flint and dent lines wee fomed. In conclusion, multivaiate analyses based on wee moe effective in gouping the gemplasm than analyses based on the genetic distance D A, wheeas gouping on basis of was slightly less effective than gouping on basis of D A. Coelation of the tansciptome-based distances with hybid pefomance and heteosis In many studies in maize, it has been obseved that the genetic distance between the paents of inte-pool hybids was not coelated with the hybid pefomance o heteosis (cf. Melchinge 1999). Ou data on the AFLP-based genetic distance D A confims this esult (Fig. 4). The significant coelations of the tansciptome-based distances with hybid pefomance and heteosis may be explained by the high density of investigated loci, the analysis of the genes athe than makes, and the inclusion of additive additive inteactions in the analysis. Tansciption pofiling esulted in 10,810 diffeentially expessed genes in the factoial. Such a high numbe of loci investigated means a good coveage of the genes undelying gain yield and, theefoe, esulted not only in significant but vey stong coelations. Founde effects, selection, and andom genetic dift can esult in diffeences in the linkage disequilibium between make alleles and functional alleles in diffeent heteotic pools (Boppenmeie et al. 1992; Chacosset and Essioux

448 Theo Appl Genet (2010) 120:441 450 Fig. 5 Coelation of hybid pefomance (left-hand side) and mid-paent heteosis (ight-hand side) fo gain yield with the binay distance (top) and Euclidean distance (bottom). The distances wee detemined fom the subset of genes S y compising 1,424 genes of which diffeential expession is associated with hybid pefomance (left-hand side), and the subset S h compising 1,763 genes of which diffeential expession is associated with heteosis (ighthand side). ***P B 0.001 100 105 110 115 100 105 110 115 = 0.86*** 0.2 0.4 0.6 0.8 = 0.78*** 40 45 50 55 60 65 40 45 50 55 60 65 = 0.88*** 0.3 0.5 0.7 0.9 = 0.81*** 10 20 30 40 50 60 10 20 30 40 50 60 Fig. 6 Coelations of the obseved with pedicted hybid pefomance and mid-paent heteosis fo gain yield obtained by coss validation with 100 ounds. The bold lines in the boxes denote the median, the bodes of the boxes the quatiles, and the ends of the whiskes the exteme values of. Left-hand side: The genes fo pediction wee selected in the estimation set with an exact binomial test (a = 0.05), esulting in appox. 800 1,000 genes fo each tait in the individual coss validation ounds. denotes pediction with the binay and with the Euclidean tansciptome-based distances. Results fo GCA and SM-TEAM wee taken fom Schag et al. (2007). Right hand side: Fixed numbes of 50 to 5,000 genes wee selected in each coss validation ound on basis of small pobabilities P g 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 GCA SM TEAM 50 100 200 500 1000 1500 2000 3000 4000 5000 No. of selected genes

Theo Appl Genet (2010) 120:441 450 449 1994). While in one heteotic pool, a cetain make allele can be in coupling phase linkage with a cetain allele at a functional locus, in the opposite pool the make allele may be in epulsion phase linkage with this functional allele. Theefoe, inte-pool genetic distances at make loci may povide only a poo estimate fo the diffeences at functional genes between two lines belonging to diffeent heteotic pools. Expession pofiling investigates diectly the genes, and does not ely on linkage disequilibium between make alleles and functional alleles. Theefoe, it is not affected by diffeent linkage phases in diffeent heteotic pools and quantifies diectly the diffeences at functional genes between two lines. This seems to be the main eason why the tansciptome-based distances ae stongly coelated with hybid pefomance and heteosis, wheeas the inte-pool genetic distances ae not coelated. Futhemoe, additive additive inteactions esponsible fo inceased RNA tansciption ae accounted fo in the tansciption pofiling and, hence, contibute to the tansciptome-based distances. These may incease the popotion of phenotypic vaiance explained by the distances and, thus, can also contibute to the high coelation of tansciptome-based distances with hybid pefomance and heteosis. The coelation of tansciptome-based distances with hybid pefomance and heteosis fo selected genes (subsets S y and S h, Fig. 5) was consideably lage than fo unselected genes (subset S p, Fig. 4). ence, using a set of selected genes fo pediction models has the potential to incease pediction efficiency. Tansciptome-based pediction of hybid pefomance and heteosis Selection of the set of genes S, employed fo pediction, was based on the binomial pobability (Eq. 7). It was detemined sepaately fo each coss validation un, employing a theshold of a = 0.05 and no adjustment fo multiple testing. This was an abitay choice, and the numbe of genes selected with such a pocedue depends stongly on the size of the estimation set and chosen thesholds. Fo the pesent data set, howeve, it esulted in appoximately 800 1,000 genes fo each un, which was nea the optimum value of 1,000 1,500 genes (Fig. 6). An altenative stategy would be to detemine once a coe set of genes esponsible fo a given tait and use these genes subsequently fo pediction. It is an inteesting aea fo futhe eseach, whethe using a fixed set of genes impoves the pediction efficiency o not. Pedicted values of heteosis ae of use in pactical beeding pogams only in combination with the mid-paent value of line pe se pefomance, wheeas the pedicted hybid pefomance can be applied diectly fo making selection decisions. Theefoe, the pediction of heteosis is of use, if it can be accomplished with highe pecision than that of hybid pefomance. In ou study, the pedictions of hybid pefomance and heteosis wee equally pecise (Fig. 6). Consequently, we ecommend pediction of the hybid pefomance athe than that of heteosis. Pedictions with the binay distance showed geate coelations to the obseved values than pedictions with the Euclidean distance. Futhe, outpefomed by fa the GCA-based pediction and also the pediction with a linea model using selected AFLP makes (Fig. 6). In consequence, pediction models employing the tansciptome-based distance should povide the most pecise pedictions of hybid pefomance available to date. Applications in beeding pogams Expession pofiling of seedlings can be conducted diectly afte poducing new inbed lines. Fom the tansciptome data, the pefomance of possible hybids can be pedicted and the pomising hybids can be poduced and tested in field tials. This indiect pe-selection step based on expession pofilecs can enhance the esponse to selection. At pesent tansciptome analysis is expensive, but a decease in lab costs is expected and with such a cost decease, the suggested pe-selection can incease the cost efficiency of beeding pogams. Assuming a heitability nea one of the tansciptome data (h I 2 = 1), the esponse G I to indiect selection based on tansciptome-based distances is G I = ih I g, wheeas that fo diect selection based on field tials is G D = ih D g (assuming constant selection intensity i and genetic vaiance g 2 ). Consequently, with coelations of & 0.8 (Fig. 6) indiect selection has the same efficiency as that of diect selection with a heitability of h D 2 = (0.8) 2 = 0.64. This demonstates the potential utility of the tansciptomebased pediction of hybid pefomance in hybid beeding pogams. Acknowledgments This eseach was funded by the Deutsche Foschungsgemeinschaft (DFG, Geman Reseach Foundation) within the pioity pogam SPP 1149 eteosis in Plants (gant no. FR 1615/4-1). We thank Pof. D. B.S. Dhillon fo helpful suggestions on ou manuscipt. This aticle is dedicated to Pofesso D.. Fiedich Utz on the occasion of his 70th annivesay. Refeences Bichle JA, Auge DL, Riddle NC (2003) In seach of the molecula basis of eteosis. Plant Cell 15:2236 2239 Boppenmeie J, Melchinge AE, Bunklaus-Jung E, Geige, emann RG (1992) Genetic divesity fo RFLPs in Euopean maize inbeds. I: elation to pefomance of Flint x Dent cosses fo foage taits. Cop Sci 32:895 902

450 Theo Appl Genet (2010) 120:441 450 Chacosset A, Essioux L (1994) The effect of population-stuctue on the elationship between heteosis and heteozygosity at make loci. Theo Appl Genet 89:336 343 Guo M, Rupe M, ang X, Casta O, Zinselmeie C, Smith O, Bowen B (2006) Genome-wide tanscipt analysis of maize hybids: allelic additive gene expession and yield heteosis. Theo Appl Genet 113:831 845 Ihaka R, Gentleman R (1996) R: A language fo data analysis and gaphics. J Comput Gaph Stat 5:299 314 Kendzioski C, Iizay RA, Chen K-S, aag JD, Gould MN (2005) On the utility of pooling biological samples in micoaay expeiments. Poc Nat Acad Sci 102:4252 4257 Ke MK, Chuchill GA (2001) Statistical design and the analysis of gene expession micoaay data. Genet Res 77: 128 Maenhout S, De Baets B, aesaet G (2009) Pediction of maize single-coss hybid pefomance: suppot vecto machine egession vesus best linea pediction. Theo Appl Genet (in pess) Melchinge AE (1999) Genetic divesity and heteosis. In: Coos JG, Pandey S (eds) The genetics and exploitation of heteosis in cops. ASA-CSSA, Madison, pp 99 118 Reif JC, Melchinge AE, Fisch M (2005) Genetical and mathematical popeties of similaity and dissimilaity coefficients applied in plant beeding and seed bank management. Cop Sci 45:1 7 Reif JC, Melchinge AE, Xia XC, Wabuton ML, oisington DA, Vasal SK, Sinivasan G, Bohn M, Fisch M (2003) Genetic distance based on simple sequence epeats and heteosis in topical maize populations. Cop Sci 43:1275 1282 Repsilbe D, Andof S, Selbig J, Altmann T, Witucka-Wall (2009) Eniched patial coelations in genome-wide gene expession pofiles of hybids (A. thaliana) a systems biological appoach towads the molecula basis of heteosis. Theo Appl Genet (in pess) Schag TA, Melchinge AE, Soensen AP, Fisch M (2006) Pediction of single-coss hybid pefomance fo gain yield and gain dy matte content in maize using AFLP makes associated with QTL. Theo Appl Genet 113:1037 1047 Schag TA, Maue P, Melchinge AE, Piepho P, Peleman J, Fisch M (2007) Pediction of single-coss hybid pefomance in maize using haplotype blocks associated with QTL fo gain yield. Theo Appl Genet 114:1345 1355 Schag TA, Möhing J, Maue P, Dhillon BS, Melchinge AE, Piepho P, Soensen AP, Fisch M (2009a) Molecula makebased pediction of hybid pefomance in maize using unbalanced data fom multiple expeiments with factoial cosses. Theo Appl Genet 118:741 751 Schag TA, Möhing J, Kustee B, Dhillon BS, Melchinge AE, Piepho P, Fisch M (2009b) ybid pefomance pediction in maize using molecula makes and joint analyses of hybids and paental inbeds. Theo Appl Genet (in pess) Spinge NM, Stupa RM (2007) Allelic vaiation and heteosis in maize: ow do two halves make moe than a whole? Genome Res 17:264 275 Steinfath M, Gätne T, Lisec J, Meye RC, Altmann T, Willmitze L, Selbig J (2009) Pediction of hybid biomass in Aabidopsis thaliana by selected paental SNP and metabolic makes. Theo Appl Genet (in pess) Stupa RM, Gadine JM, Olde AG, aun WJ, Chandle VL, Spinge NM (2008) Gene expession analyses in maize inbeds and hybids with vaying levels of heteosis. BMC Plant Biol 8:33 Thiemann A, Fu J, Schag TA, Melchinge AE, Fisch M, Scholten S (2009) Coelation between paental tansciptome and field data fo the chaacteization of heteosis in Zea mays L. Theo Appl Genet (in pess) Vuylsteke M, Kuipe M, Stam P (2000) Chomosomal egions involved in hybid pefomance and heteosis: thei AFLP-based identification and pactical use in pediction models. eedity 85:208 218