Legume Res., 33 (2) : 95-101, 2010 AGRICULTURAL RESEARCH COMMUNICATION CENTRE www.arccjournals.com / indianjournals.com CLUSTER ANALYSIS: A COMPARISON OF FOUR METHODS IN RICE BEAN [VIGNA UMBELLATE (THUNB.)OHWI & OHASHI] R.C. Misra and P. Swain Orissa University of Agriculture and Technology, Bhubaneswar 751 003, India ABSTRACT Twenty five rice bean elite lines from Bhubaneswar, Ludhiana and Ranchi were evaluated in RBD for nine yield component traits and seed yield. The genotypes showed wide and highly significant variations in all ten traits. Cluster analysis of the 25 rice bean genotypes based on the ten yield and component traits was done in four different methods viz. Genetic Divergence D 2, Canonical analysis, Similarity coefficients and Principal component analysis. In all four methods the 25 genotypes could be classified into nine different clusters. Though the composition of clusters varied with methods, there was similarity of cluster composition of D 2 analysis with canonical analysis and of similarity coefficient with principal component analysis. Comparison of effectiveness of the four methods of clustering in bringing out the diversity among the clusters of genotypes was done by partitioning total sum of squares among genotypes for each character into sum of squares for between and within cluster variation. The study revealed that similarity coefficient and principal component analysis were more effective in maximizing between cluster differences than D 2 and canonical analysis. Moreover, similarity coefficient analysis gave a hierarchical presentation and would help in further partitioning into sub-clusters. Based on intergenotype diversity in all four methods and character complementation it can be inferred that crosses of BRB-20, BRB-6, BRB-14 and BRB-20 with BRB-9 are expected to produce more transgressive segregants in later generations. Key wards: Rice bean, Cluster analysis, D 2, Canonical, Similarity coefficient, Component analyses. INTRODUCTION The rice bean crop is adapted to high temperature and humidity as well as to heavy soils. Furthermore, it grows and matures quickly and is relatively free from major insect and disease problems. Rice bean is a valuable crop, one that deserves increased testing throughout tropics (Tropical Legumes, 1979). Plant breeders have since long appreciated the importance of diversity between parents in cross breeding programme. The success of a cross in giving productive segregants is much dependant on the character complementation of the parents. In recent times breeders are giving greater importance to genetic diversity than geographic diversity in selection of parents for cross breeding programme. D 2 analysis of is most frequently by plant breeders for estimation of genetic divergence among genotypes. Some breeders have used metroglyph analysis, canonical analysis, principal component analysis, factor analysis etc. for assessing diversity among genotypes. Taxonomists use several classificatory analyses such as metroglyph analysis, Euclidean distance, similarity coefficient analysis etc. for clustering of collections in phylogenetic studies. The present study aimed at comparing the efficiency of four different cluster analysis methods 1. Genetic Divergence D 2 (Rao, 1952), 2. Canonical analysis (Rao, 1952), 3. Similarity coefficients (Sneath and Snokal, 1973) and 4. Principal component analysis (Mardia et al., 1979). Such study would immensely help plant breeders and taxonomists in classification of varieties or collections. The test crop for the present study was rice bean and 25 elite lines were evaluated for nine Email: ramachandra.mishra23@gmail.com
96 LEGUME RESEARCH component traits and seed yield. The study would help breeders in grouping of genotypes on the basis of genetic affinity and also in identification of genetically diverse genotypes for use in cross breeding programme. MATERIAL AND METHODS Material for the present study comprised 25 genotypes of rice bean developed from different states of India. The genotypes included nine elite lines (BRB-entries) from Bhubaneswar, one line (BHR-1) from Ranchi and 11 elite lines (LRB and RL -entries) and four released varieties (RBL-1, RBL-6, RBL-35 and RBL-50) from Ludhiana. Field trial was conducted in RBD with three replications. Trial was sown on 15 th September, 2008 and each entry was represented by 10 rows of 3m length with a spacing of 30 cm x 10cm. A fertilizer dose of 25:50:25 kg NPK/ha was applied and need based plant protection measures followed. Observations on days to flowering (DF) and days to maturity (DM) were taken on plot basis. Observations on other characters like plant height (PH), branches/plant (B/ P), clusters/plant (C/P), pods/plant (P/P), pod length (PL), seeds/pod (S/P), 100-seed weight (HSW) and yield/plant (Y/P) were recorded on five random plants per plot in each replication. Analysis of variance for each character was carried out in RBD. Cluster analysis of the 25 rice bean genotypes based on the ten yield and component traits was done in four different methods. Mahalanobis D 2 statistic (Rao, 1952) was used for estimation of genetic divergence and the 25 genotypes were grouped into different genetic clusters on the basis of D 2 values using Tochers method (Rao, 1952). Canonical analysis involved calculation of canonical vectors or canonical roots (Rao, 1952). The first two canonical root values (Z 1 and Z 2 ) of each genotypes taken for two dimensional presentations in scatter diagram and the genotypes scattering close to each other are taken to form a group or cluster. For numerical taxonomic classificatory analysis, the general similarity coefficient (S G ) of Gower was used as a measure of resemblance between the genotypes. Gower s S G values between pairs of genotypes were calculated in UPGMA (unweighted pair group method using arithmetic average) method (Sneath and Sokal, 1973). Using SHAN (Sequential, agglomerative, hierarchic, non-overlaping) clustering method (Sneath and Sokal, 1973) phenogram or dendrogram was constructed and the genotypes were grouped into different clusters at 0.75 phenon level. Principal component analysis (PCA) is a multivariate statistical method which seeks to summarize the variation in multivariate sample with fewer variables than the original set with minimal loss of information (Mardia et al. 1979). It gives a few linear combinations (principal components) of the characters which give maximal variance among the genotypes. The first two principal components PC1 and PC2 scores of each genotype was taken for two dimensional presentations in scatter diagram and the genotypes scattering close to each other are taken to form a group or cluster. In order to compare the effectiveness of the four methods in bringing out the diversity among the clusters of genotypes, the total sum of squares among genotypes for each character was partitioned into sum of squares for between and within cluster variation and was expressed as per cent of the total. RESULTS AND DISCUSSION The 25 rice bean genotypes showed wide variation in all ten traits including seed yield and the differences were highly significant (at 1 % level) for all traits. Seed yield of the genotypes varied from 5.16 to 12.84 g/plant and the higher yielding genotypes were BRB-20 followed by BRB-6, BRB- 11, BRB-14 and BRB-9, all yielding more than 10 g/plant. High yield potential of BRB-20, BRB-6 and BRB-14 was due to more pods, while that of BRB- 11 and BRB-9 were due to higher values of seeds/ plant and 100-seed weight, respectively. The use of D 2 analysis of genetic divergence and grouping of varieties or germplasm lines into genetic clusters has been extensively reported in various crops. Genetic divergence among the 25 rice bean genotypes was estimated using Mahalanobis D 2 statistics. The D 2 estimates among the genotypes ranged from 6.71 to 321.12 indicating that the genotypes possess much genetic diversity among themselves in multivariate traits. The genotypes BRB- 9, LRB-324, RBL-50 and BRB-15 had high D 2 value among themselves and also had high D 2 from remaining 21 genotypes, indicating them to be much diverse in multivariate traits. On the basis of D 2 values using Tocher s method, the 25 genotypes grouped into nine genetic clusters (Table 1). Cluster
Vol. 33, 2, 2010 97 Table 1: Cluster composition in the four different Clustering Methods D 2 Canonical Similarity Principal Analysis analysis coefficient component Cl. Genotype* Cl. Genotype* Cl. Genotype* Cl. Genotype* I 3, 4, 6, 10, 14, I 3, 6, 13, 14 VI 3, 6 VI 2, 3, 6 23 II 8, 13, 22, 24 II 8, 22, 23, 24 IV 8, 22, 23, 24 IV 8, 22, 23, 24 III 11,12,15,16, III 10,11,12,15, 16, I 11, 12, 13, 14, I 11,12,13,14,15, 18, 21, 18, 21 15,16, 18, 20,21 16, 18, 21 IV 9, 19, 20 IV 9, 19, 20 II 9, 17, 19 II 9, 17, 19, 20 V 1, 7 V 1, 7 VII 1, 7 VII 1, 7 VI 5 VI 4, 5 IX 5 IX 5 VII 25 VII 25 V 25 V 25 VIII 17 VIII 17 III 10 III 10 IX 2 IX 2 VIII 2, 4 VIII 4 * Genotype names 1. BRB-6 2. BRB-9 3. BRB-11 4. BRB-14 5. BRB-15 6. BRB-18 7. BRB-20 8. LRB-141 9. LRB-160 10. LRB-162 11. LRB-172 12. LRB-184 13. LRB-189 14. LRB-193 15. LRB-218 16. LRB-241 17. LRB-324 18. LRB-460 19. LRB-470 20. RL-3 21. BHR-1 22. RBL-1 23. RBL-6 24. RBL-35 25. RBL-50 I, II and III included 6, 4 and 6 genotypes, respectively, while Cluster IV and V included 3 and 2 genotypes. Remaining four clusters (VI, VII, VIII, IX) were represented by one genotype each. Similar D 2 analysis and grouping of varieties into genetic clusters has been reported in rice bean by Thaware et al. (1997), Mandal and Dana (1998), Singh et al. (1998), Chaudhari et al. (1999), Singh et al. (1999) and Sahu et al. (2007). Character means for different genetic clusters (Table 2) showed that clustering was effective in bringing out character differences among the clusters. High seed yield of Cluster V was mainly due to high branches/plant, clusters/plant and pods/ plant, while high yield of Cluster IX was due to longer pods and bolder seeds. Canonical analysis of the 25 rice bean genotypes based on the ten characters showed that the first two canonical roots Z 1 and Z 2 explained 37.6% and 25.7%, of the diversity, respectively. The genotypes are presented as scatter of points in a two scatter diagram using Z 1 and Z 2 values of each genotype (Fig.1). Depending on the closeness of points representing genotypes in the scatter diagram, the 25 genotypes grouped into nine clusters (Table- 1). Cluster I represented by four genotypes occupied the central position in the scatter diagram and Cluster II and Cluster III, represented by four and seven genotypes, were close to Cluster I. The Cluster IV, V and VI, represented by 2, 2 and 3 genotypes, were away from each other and also away from first three clusters. The Clusters VII, VIII and IX included one genotype each BRB-50(25), LRB-324(17) and BRB- 9(2), respectively. These three clusters, which lied in the extremes of the scatter diagram, were far away from each other and also away from rest of the six clusters. Character means for different clusters (Table 2) showed that high seed yield of Cluster V was due to more branches/plant, clusters/plant and pods/ plant, while high yield of Cluster IX was due to longer pods and bolder seeds. Clustering pattern on basis of D 2 Tocher s method and canonical analysis were similar with few deviations. Composition of Cluster IV, V, VII, VIII and IX were same in both methods, while in other four clusters there were few exchanges. In both methods, the 7 genotypes (BRB entries) from Bhubaneswar grouped into four clusters and the 17 genotypes (LRB, RL and RBL-entries) from
98 LEGUME RESEARCH Table 2: Character mean of clusters in different Clustering Methods Clusters DF DM PH B/P C/P P/P PL S/P HSW Yield D 2 Analysis I 47.7 91.2 81.8 2.38 11.9 17.5 8.08 7.80 5.88 8.67 II 51.3 93.0 94.8 1.93 9.8 16.1 7.62 7.02 6.05 7.73 III 46.6 89.7 88.0 1.86 8.9 13.8 7.86 6.96 5.34 5.54 IV 38.8 82.4 82.6 2.13 10.4 15.4 7.94 6.77 5.91 6.48 V 50.3 94.0 90.9 2.70 15.3 22.9 7.80 7.02 5.98 11.95 VI 40.3 81.0 85.1 2.60 12.2 20.5 7.44 7.63 5.98 9.93 VII 56.3 96.7 94.0 2.20 9.1 17.2 7.70 8.20 5.13 7.86 VIII 35.0 79.3 78.8 2.40 7.5 13.4 7.67 7.10 4.96 5.60 IX 47.3 92.7 74.5 2.13 13.7 19.4 8.11 7.07 7.13 10.09 Canonical Analysis I 47.8 91.3 85.0 2.40 10.9 16.3 7.94 7.72 5.95 8.07 II 52.3 94.8 94.9 2.02 10.4 17.0 7.53 7.15 6.00 8.37 III 46.7 89.7 86.4 1.83 8.9 13.8 7.98 7.16 5.38 5.76 IV 38.8 82.4 82.6 2.13 10.4 15.4 7.94 6.77 5.91 6.48 V 50.3 94.0 90.9 2.70 15.3 22.9 7.80 7.02 5.98 11.95 VI 41.8 83.5 79.4 2.63 14.2 21.1 7.88 7.33 6.04 10.30 VII 56.3 96.7 94.0 2.20 9.1 17.2 7.70 8.20 5.13 7.86 VIII 35.0 79.3 78.8 2.40 7.5 13.4 7.67 7.10 4.96 5.60 IX 47.3 92.7 74.5 2.13 13.7 19.4 8.11 7.07 7.13 10.09 Similarity Coefficient Analysis I 46.0 89.0 87.2 1.90 8.7 13.9 8.76 6.97 5.59 5.72 II 37.0 81.1 83.0 2.33 10.1 15.5 7.83 6.84 5.53 6.30 III 47.3 89.3 76.9 1.67 9.0 14.6 8.75 8.37 5.61 7.07 IV 52.3 94.8 94.9 2.02 10.4 17.0 7.53 7.15 6.00 8.37 V 56.3 96.7 94.0 2.20 9.1 17.2 7.70 8.20 5.13 7.86 VI 48.5 92.8 80.6 2.70 13.6 18.0 8.06 8.37 5.84 10.06 VII 50.3 94.0 90.9 2.70 15.3 22.9 7.80 7.02 5.98 11.95 VIII 45.3 89.3 74.1 2.40 15.0 20.5 8.22 7.05 6.62 10.02 IX 40.3 81.0 85.1 2.60 12.2 20.5 7.44 7.63 5.98 9.93 Principal Component Analysis I 46.8 89.7 88.3 1.92 8.7 14.0 7.85 6.98 5.52 5.67 II 37.8 81.7 81.7 2.20 9.7 14.9 7.87 6.85 5.67 6.26 III 47.3 89.3 76.9 1.67 9.0 14.6 8.75 8.37 5.61 7.07 IV 52.3 94.8 94.9 2.02 10.4 17.0 7.53 7.15 6.00 8.37 V 56.3 96.7 94.0 2.20 9.1 17.2 7.70 8.20 5.13 7.86 VI 48.1 92.8 78.6 2.51 13.6 18.5 8.08 7.93 6.27 10.07 VII 50.3 94.0 90.9 2.70 15.3 22.9 7.80 7.02 5.98 11.95 VIII 43.3 86.3 73.7 2.67 16.3 21.7 8.33 7.03 6.11 10.14 IX 40.3 81.0 85.1 2.60 12.2 20.5 7.44 7.63 5.98 9.93 Bold figures show highest or lowest cluster means for the character Ludhiana grouped into six clusters, indicating that geographic origin of the genotype has no parallelism with genetic clustering, which is in agreement with reports of Thaware et al. (1997) and Sahu et al. (2007). Use of similarity coefficient analysis for clustering of genotypes has been reported in pointed gourd (Dora et al., 2003) and Jatropha (Das et al., 2008). They have brought out the advantages of dendrogram representation in classifying the genotypes into different clusters and sub-clusters. Measure of resemblance between genotypes was estimated as Gower s similarity coefficient (S G ) based on the ten morphological characters. Similarity coefficient values among the genotypes ranged from 0.915 to 0.389. In general, the genotypes BRB-15(5), BRB-9(2) and BRB-20(7) had low S G values with most others indicating that these genotypes possess showed much dissimilarity from most other genotypes. Using the S G value matrix, the genotypes were presented in dendrogram (Fig. 2) and clusters were identified at 0.75 phenon level.
Vol. 33, 2, 2010 99 Table 3: Partitioning of total sum of squares of genotypic variation to between and within cluster sum of squares for the four Clustering Methods Characters Between cluster and within cluster sum of squares ( in % ) for characters D 2 Analysis Canonical Analysis Similarity Coefficient Principal Component Between Within Between Within Between Within Between Within DF 88.30 11.70 95.49 5.41 89.22 10.78 94.67 5.33 DM 80.97 19.30 90.06 9.94 83.61 16.39 92.58 7.42 PH 55.52 44.48 53.51 46.49 64.62 35.38 68.67 31.33 B/P 63.98 36.02 73.80 26.20 84.28 15.72 75.00 25.00 C/P 58.24 41.76 66.13 33.87 82.93 17.07 83.52 16.48 P/P 70.57 29.43 80.16 19.84 83.70 16.30 82.87 17.13 PL 30.96 69.04 26.19 73.81 64.85 35.15 65.67 34.33 S/P 56.36 43.64 34.71 65.29 78.46 21.54 64.39 35.61 HSW 87.35 12.65 87.62 12.38 51.32 48.68 41.09 58.91 Yield 77.75 22.25 78.59 21.41 93.21 6.79 93.35 6.65 Average 67.00 33.00 68.63 31.37 77.62 22.38 76.18 23.82 33 VII 31 25 II 29 III 22 1 27 24 8 V 23 18 11 25 7 15 12 21 14 6 I 23 16 10 13 3 VI IX 21 5 4 VIII 19 17 IV 9 20 19 2 17 40 42 44 46 48 50 52 54 56 58 Fig. 1: Z 1 Z 2 scatter diagram by Canonical analysis showing genotype clusters Similarity coefficient dendrogram showed that the 25 genotypes formed nine clusters (Table 1). Cluster I, II, IV, VI, VII and VIII included 9, 3, 4, 2, 2 and 2 genotypes, respectively, while Cluster III, V, and IX were represented by one genotype each. The hierarchical presentation in S G dendrogram has the added advantage of dividing clusters into subclusters. Cluster I, the largest cluster, showed four sub-clusters I(a) - LRB-184(12), LRB-189(13), LRB-241(16), I(b) - LRB-218(15), LRB-460(18), LRB-172(11), I(c) - LRB193 (14), BHR-1(21) and I(d) - RL-3(20) at higher phenon level. Similarly Cluster-II could be divided into two sub-clusters II(a) - LRB-160(9), LRB-470(19) and II(b) - LRB-324. At lower phenon (0.65) level, Cluster I, II and III were close to each other. Similarly Cluster IV and V and Cluster VI, VII, VIII and IX formed close clusters. The three groups of clusters (I, II, III), (IV, V) and (VI, VII, VIII, IX) were quite dissimilar from each other. A close examination of the dendrogram showed that the 17 genotypes from Ludhiana fell into five clusters (I - V), which formed two major groups. Similarly the genotypes from Bhubaneswar fell into four clusters (VI - IX) and formed one major group. Thus genotypes from Ludhiana and Bhubaneswar formed different major groups of clusters. However, within group clustering indicated that genotypes developed from the same geographic location are quite dissimilar in multivariate traits and formed different clusters. Character means for different clusters (Table 2) showed that high seed yield of Cluster VII was due to more branches/plant, clusters/plant and pods/plant, while high yield of Cluster VI, VIII and IX were due to higher values for seeds/pod,100-seed weight and pod length, respectively. As the ten component traits were in different scales/units, correlation matrix was used for the
100 LEGUME RESEARCH 0.6 Similarity Coeffcient 0.7 0.8 I II III IV V VI VII VIII IX 0.9 1.0 12 13 16 15 18 11 14 21 20 9 19 17 10 22 23 24 8 25 3 6 1 7 2 4 5 Genotypes Fig. 2: Similarity coefficient (S G ) dendrogram showing genotype clusters PC2 16 15 14 13 12 I V 25 15 18 16 12 11 13 21 14 24 III 22 10 23 8 IV 6 2 VI 3 1 7 VII 11 10 9 II 20 17 9 19 PC1 15 16 17 18 19 20 21 22 23 IX 5 VIII 4 Fig.3: PC1-PC2 scatter diagram by Principal Component analysis showing genotype clusters Principal component (PCA) analysis. The first two principal components PC1 and PC2 explained 39.1% and 23.8% of the total variation among the genotypes. The genotypes were presented in a scatter diagram on basis of their PC1 and PC2 values (Fig- 3). Genotypes falling close to each other were grouped into a cluster and those scattering away to different clusters. On this basis the 25 genotypes grouped into nine clusters (Table 1). Cluster I, II and IV included 8, 4 and 4 genotypes, respectively and all these lines (except BHR-1) were from Ludhiana. The genotypes included in Cluster VI and Cluster VII were from Bhubaneswar centre. Clusters III, V, VII and IX were represented by one genotype each. Clustering by principal component analysis was broadly similar with that by similarity coefficient analysis, with few exchanges. Character means for different clusters (Table 2) showed that high seed yield of Cluster VII was due to more branches/plant and pods/plant, while high yield of Cluster VI, VIII and IX were due to higher values for 100-seed weight, clusters/plant and pod length, respectively. In all four methods the 25 genotypes grouped into nine clusters each indicating their effectiveness in bringing out the diversity. Cluster compositions in the four different methods showed broad similarity. The genotypes LRB-172(11), LRB-184(12), LRB- 218(15), LRB-241(16), LRB-460(18) and BHR- 1(21) clustered together in all four methods. Similarly BRB-11(3) and BRB-18(6); LRB-141(8), RBL-1(22) and RBL-35(24); LRB-160(9) and LRB-470(19) and BRB-6(1) and BRB-20(7) grouped together in four different clusters and RBL-50(25) formed unigenotype cluster in all four methods. Cluster analyses aim at grouping of objects or genotypes into different clusters in such an effective way so that between cluster differences
become maximum and within clusters variations become minimum. The present study envisaged comparing the effectiveness of the four methods of clustering in bringing out the divergence among the clusters of genotypes. In order to compare the four methods of clustering, the total sum of squares among genotypes for each character was partitioned into sum of squares for between clusters and within cluster variation and was expressed as per cent of the total variation (Table 3). Contribution of different characters to between cluster divergence showed wide variation ranging from 30.96 88.30 % with a mean of 67.00 % in case of D 2 analysis and it varied from 26.19 95.49 % with mean of 68.63 % in case of canonical analyses. In both the cases days to flowering, days to maturity and 100-seed weight had high contribution and pod length and seeds/pod had low contribution to between cluster variations. The contribution of characters to between cluster divergence showed narrower variation ranging from 51.32 93.21 % with a mean of 77.62 % in case of similarity coefficient analysis and from 41.09 94.67 % with mean of 76.18 % in case of principal component analyses. In both the cases yield/plant, days to flowering, days to maturity, pods/plant and clusters/plant had high contribution and 100-seed Vol. 33, 2, 2010 101 weight had least contribution to between cluster variations. A comparison of the four methods showed that cluster composition and extent of contribution of characters to between cluster diversity in D 2 method was similar to that of and canonical analysis and that in similarity coefficient was similar to that of principal component analysis. It may be due to the fact that diversity estimation and clustering is done utilising mean data in case of both similarity coefficient and principal component analyses, while for D 2 and canonical analyses the mean data is first transformed using error variance and covariance matrix and the transformed data is used for estimation of diversity among genotypes. The comparison also revealed that clustering on basis of similarity coefficient and principal component analyses was more effective in maximizing between cluster differences (77.62 and 76.18 %) in comparison to D 2 and canonical analyses (67.00 and 68.63 %). Based on inter-genotype diversity in all four methods, yield potential of genotypes and character complementation it can be inferred that crosses of BRB-20, BRB-6, BRB- 14 and BRB-20 with BRB-9 are expected to produce more transgressive segregants in later generations. REFERENCES Chaudhari, G.B. et al. (1999). Indian J. Pulses Res., 12: 251-253. Das, S. et al. (2008). Agric. Sci. Digest.,28: 298-300. Dora, D. K. et al. (2003). Orissa J. Hort., 31: 76-79. Mandal, N. and Dana, I. (1998). Ann. Agric.Res., 19: 18-21. Mardia, K.V. et al. (1979). Multivariate Aanalysis. Academic press, London. Rao, C.R. (1952). Advance statistical methods in biometric research. John Wiley and Sons, New York. Sahu, P.K. et al. (2007). Env. and Eco., 25S(3A) : 808-812. Singh, G. et al. (1998). Indian J. Genet., 58: 101-105. Singh, M.R.K. et al. (1999). Indian J. Genet., 59: 221-225. Sneath, P.H. and Sokal, R.R. (1973): Numerical Taxonomy: The Principles and Practice of Numerical Classification. W.H. Freeman and Company. San Francisco, USA. Thaware, B.L. et al. (1997). Legume Res., 20: 91-96. Tropical Legumes (1979). Resources for future. Report of Advisory Committee on Technology Innovations. National Research Centre. National Academy of Sciences, Washington D. C., 80-85.