Band Selection Using Clustering Technique for Dimensionality Reduction in Hyper spectral Image

Size: px
Start display at page:

Download "Band Selection Using Clustering Technique for Dimensionality Reduction in Hyper spectral Image"

Transcription

1 Band Selecton Usng Clusterng Technque for Dmensonalty Reducton n Hyper spectral Image 1 Karthck.V, 2 Veera Senthl Kumar.G, 3 Dr. Vasuk. S 1 Lecturer, 2 Assstant Professor, 3 Professor and Head, 1,2,3 Dept.of.ECE, Velammal College of Engneerng and Technology, Madura, INDIA, vkt@vcet.ac.n, gvs@vcet.ac.n, sv@vcet.ac.n Abstract - Ths paper presents a novel approach for band selecton usng K-means clusterng technque for dmensonalty reducton n Hyperspectral mages (HSI). Despte many algorthms exst for dmensonalty reducton, t s even now a challengng task of selectng nformatve bands from the large volume data. The number of bands s estmated wth the concept of Vrtual Dmensonalty (VD), because t provdes relable estmate. Bands are clustered usng K-means based on the statstcal measures such as Varance(VAR), Standard Devaton(STD) and Mean Absolute Devaton(MAD). Bands preservng maxmum nformaton are selected based on the maxmum value of statstcal measures. Fnally, End members are extracted from the selected bands usng N-FINDR and ther spectral sgnatures are determned. The whole expermentaton s carred out n MATLAB. Keywords- Dmensonalty Reducton, K-means clusterng, VD, N-FINDR. I. INTRODUCTION Hyperspectral mage s a collecton of number of spectral bands where each mage pxel s represented by a column vector where each of column components s a pxel maged by a partcular spectral channel. The collected mage data by hyperspectral remote sensors s smultaneously n hundreds of narrow, adjacent spectral bands over the wavelengths that can range from the near ultravolet through the thermal nfrared at 5nm of fne resolutons. Each pxel contans a hyperspectral sgnature that represents dfferent materals. As a result of hgh spectral resoluton, hyperspectral systems produce a massve amount of data. These measurements make t possble to derve a contnuous spectrum for an mage data. Hyperspectral data helps the analyst n detecton of more materals, objects and regons wth enhanced accuracy. Hyperspectral mages provde a vast amount of nformaton about a scene, but most of that nformaton s redundant as the bands are hghly correlated. For computatonal and data compresson reasons, t s desred to reduce the dmensonal of the data set whle mantanng good performance n mage analyss tasks. There are some of the challenges to be faced durng the analyss of hyperspectral mages. Frst ssue s data storage and transmsson problem due to huge data volume. Second one s redundancy of nformaton because redundancy n data can cause convergence nstablty. Thrd one s concerned wth hgh processng tme. As a result, the mposton of requrements for storage space, computatonal load and communcaton bandwdth are aganst the real tme applcatons and t s dffcult to vsualze or to classfy such a huge amount of data. Dmensonalty reducton s a good choce to overcome these challenges. The reducton of dmensonalty s necessary for hgh accuracy n unmxng of the pxels, classfcaton and detecton. There are several methods of dmensonalty reducton whch can be further categorzed nto two groups; feature extracton and feature or band selecton. Feature selecton s preferable for dmensonalty reducton because feature extracton need most of the orgnal data representaton for extracton of features. Secondly due to transformaton n feature extracton the crtcal nformaton may have been dstorted. Compared to feature extracton, feature selecton preserves the relevant orgnal nformaton. The overvew of ths paper s as follows: the detals of the AVIRIS [1] hyperspectral mage JASPER RIDGE s dscussed n the secton II. Estmaton of number of bands by the concept of VD [1] s descrbed n the secton III. Band selecton usng K-means Clusterng algorthm [3] s explaned n secton IV. N-FINDR [2], an end member extracton algorthm to detect end members s dscussed n secton V. The 17

2 proposed methodology s explaned n secton VI. Expermental results are dscussed n secton VII. Concluson s gven n secton VIII. II. HSI DATA Jasper Rdge s a AVIRIS [1] hyperspectral data used for our expermentaon. There are 512 x 614 pxels n t. Each pxel s recorded at 224 channels rangng from 380 nm to 2500 nm. The spectral resoluton s up to 9.46nm. Snce ths hyperspectral mage s too complex to get the ground truth, submage s consdered of 100 x 100 pxels. The frst pxel starts from the (105,269) pxel n the orgnal mage. Fg. 1 Reflectance of Jasper Rdge mage band no. 120 &160 After removng the channels 1--3, , and due to dense water vapor and atmospherc effects, 198 channels reman for further analyss. Out of 198 channels, reflectances of Jasper Rdge mage band 120,160 are shown n Fg. 1. Jasper rdge s an mage wth samples and 195 dmensons are selected. 100% of egen values are retaned. Road,Tree, Sol and Water are the four end members, found n the ground truth mage. Ther spectral sgnatures are shown n Fg. 2. Fg. 2 Ground truth of endmembers spectral sgnatures III. VIRTUAL DIMENSIONALITY Vrtual Dmensonalty (VD) [1] estmates the number of endmembers present n the HSI mage. Two famlar methods wdely used for estmatng VD are Harsany-Farrand-Chang (HFC) method as well as a nose whtened verson (NWHFC)] [5], on the bass of Neyman-Pearson detecton theory to explore how many tmes the test fals for all spectral bands and for a gven false-alarm probablty, P F. Snce the HFC method does not have a nose-whtenng process, an alternatve s to modfy the HFC method by ncludng a nose-whtenng process as preprocessng to remove the second-order statstcal correlaton such that the nose varance n the correspondng correlaton egen value and covarance egen value wll be the same. As a result, the VD estmate can be more accurate due to the fact that the nose varances have been decorrelated and do not have effects on the egen value comparson. The resultng HFC method wll be referred to as nose-whtened HFC method. IV. BAND SELECTION Band Selecton s performed usng clusterng technque. K-means clusterng s adopted here because t s a famlar and smple unsupervsed clusterng technque. Clusterng of band mages are performed by keepng the ntra-cluster varance mnmum and the nter-cluster varance maxmum. The method n whch dmensonalty s reduced by selectng a subset of the orgnal dmensons are known as band/ feature selecton. The hyperspectral data s spread n some drecton. Ths data can be measured by usng dfferent statstcal methods whch nclude MAD [1], moment, varance, mean, geometrc mean and standard devaton. MAD, STD and VAR [1] are used here for measurng the data. Suppose that we have B L band l l1 mages n our hyperspectral mage data cube where L s the total number on bands, f each band mage s of sze M x N and B l the mean of the band mage. The statstcal characterstcs used for data are as follows. MAD for the l th band s 1 dl 1 Standard Devaton for the band mage s 1 dl 1 Varance for the band mage s b B l b B l 18

3 The result from the above statstcal methods for L band mages s gven by: d L l l d 1 K-means clusterng s one of the smplest unsupervsed algorthms and s well-known for solvng the problem of clusterng. The flowchart of K-means clusterng s shown n Fg. 3. K-means follows a smple and easy way to classfy a gven data set through clusters; the number of clusters s fxed and s gven a pror. The number of centrods.e. K are defned for each cluster and whch are placed far away from each other as possble. The ponts whch belong to the gven data set are taken and are assocated to the nearest centrod whch results n K number of groups. Agan K new centrods are recalculated for new centers of the cluster and a new bndng has to be done between the same data set ponts and the nearest new centrod. A loop s run for the K centrods to change ther locaton step by step untl there s no change and the centrods are fxed. The centrods of the clusters are calculated by mnmzng the sum of squared errors. The K means algorthm performs three steps untl convergence. 1) Determne the centrod coordnate 2) Determne the dstance of each object to the centrods 3) Group the object based on mnmum dstance Start d l No. of Cluster k Centrod Dstance object to Centrods Groupng based on mnmum dstance b Bl No object move group End Fg. 3. K-Means clusterng flow chart (1). For the observaton X x x, x... 1, 2 3 x n, the K-means clusterng method dvdes the n observatons nto k sets (k<n),k= s 1, s2, s3,... s k, mnmzng the sum of squares wth-n clusters.e. k (2). s mn x 1 x s j j Where µ s the mean of ponts n clusters C k (3). K-means computes centrod clusters usng Squared Eucldean dstance metrc. For an m-by-n data matrx X x x, x,... the dstance between the vector xr and 1, 2 3 s x m x s gven by 1 x x D x x 2 drs r s where D s the dagonal matrx. Bands are clustered based on ther statstcal characterstcs.e. Varance, MAD (Mean Absolute Devaton) and Standard Devaton by K-means clusterng technque. A band s selected from each cluster whch has maxmum varance wth n the cluster. The proposed technque usng Varance wth Squared Eucldean as dstance metrc s abbrevated as VAR-SE. Smlarly for Standard Devaton wth Squared Eucldean as STD-SE and the technque usng MAD wth Squared Eucldean s abbrevated as MAD- SE V. ENDMEMBER EXTRACTION An endmember can be defned as an dealzed, pure sgnature for a class. In hyperspectral data analyss, a pure pxel refers to an L dmensonal pxel vector. Also t should be noted that an endmember s not a pxel. It s a spectral sgnature that s completely specfed by the spectrum of a sngle materal substance. There are two ways by whch endmembers can be dentfed: Endmember Extracton Algorthm (EEA), whch extract pure pxels drectly from the data and Endmember Generaton Algorthm(EGA), whch amed at generatng pure sgnatures from avalable pure pxels. Here EEA s focused. One of the most wdely used EEAs has been the N-FINDR algorthm developed by Wnter. It s an teratve smplex volume expanson approach whch assumes that, n L spectral dmensons, the L dmensonal volume formed by a smplex wth vertces specfed by purest pxels s always larger than that formed by any other combnaton of pxels. The generc mplementaton form of N-FINDR fnds those vertces by randomly selectng a set of p pxels from the scene as ntal endmembers, and calculatng the volume of r s 19

4 the smplex formed by these ntal endmembers. Ths process s terated through the followng steps to test every pxel n the mage as an endmember. Frst, each of the ntal endmembers s replaced one at a tme wth the pxel beng tested. Second, the volumes of the smplexes formed by each replacement are calculated. Fnally, the algorthm evaluates f replacng any of the ntal endmembers wth the pxel beng tested results n a larger smplex volume. If ths s the case, the pxel beng tested replaces the ntal endmember and the process s repeated agan untl each pxel s evaluated as a potental endmember. The pxels whch reman as endmembers at the end of the process are consdered to be the fnal endmembers. VI. PROPOSED METHODOLOGY Hyperspectral magecube wth hgh dmensonalty s preprocessed to remove all bands that are affected by dense water vapor and atmospherc effects. Ths s a common preprocess requred n hyperspectral data analyss. After preprocessng, bands wth hgh SNR s retaned as good bands for further processng. The next step s dmensonalty reducton norder to reduce the computatonal complexty. Band selecton strategy s adopted n ths paper for dmensonalty reducton. VD s estmated usng NWHFC method wth false alarm rate of 10-4, to calculate the number of bands requred for endmember extracton. Band selecton s done usng K-means clusterng based on the statstcal measures of the nput hyperspectral bands such as VAR, STD and MAD. End members are extracted usng N-FINDR algorthm and ther spectral sgnatures are determned. Hyperspectral mage Removal of low SNR & water absorpton bands VD estmaton VAR, STD and MAD calculaton of all bands Band Selecton Endmember Extracton Fg. 4. Flow Dagram of Proposed Methodology The steps of the proposed algorthm are summarzed as follows: 1) Low SNR and water absorpton bands from Hyperspectral magecube s removed. 2) VD s estmated to know the number of bands requred usng NWHFC method. 3) The data of each band mage s calculated usng VAR, MAD and STD. 4) Bands are clustered usng K-means clusterng based on the measured values to examne the proxmty of band mages to each other. 5) Accordng to VD, clusters are created whch contan all the measured values. 6) Band havng maxmum value from each cluster s selected. 7) The ndces of endmembers are determned usng N-FINDR algorthm and the correspondng endmembers sgnatures are plotted. The complete process nvolved n the proposed methodology s depcted n Fg. 4. VII. EXPERIMENTAL RESULTS AVIRIS hyperspectral data JASPER RIDGE of sze 100x100 pxels havng 224 bands, s used for our expermentaton. After removng low SNR and water absorpton bands, 198 bands are preserved. The ground truth of endmembers spectral sgnatures shown n Fg.1 reveals that there exsts four endmembers e) Road, Tree, Sol and Water, n the mage scene. Preservng the maxmum nformaton, the number of bands requred and estmated VD are 10 wth false alarm rate of 10-4 usng NWHFC method. The estmated VD for dfferent false alarm rate(p F) s shown n Table 1. TABLE 1. ESTIMATE OF VD USING NWHFC METHOD FOR DIFFERENT FALSE ALARM RATE P F VD Because of mnor varatons found n the spectral sgnatures, the number of bands s restrcted to 4. Varance, Standard Devaton and Mean Absolute Devaton are calculated for 198 bands. Usng these measures, bands are clustered usng K-means algorthm. The number of clusters s based on the number of endmembers to be dentfed. Bands havng maxmum value are selected from each cluster and thus the band selecton s acheved here. The selected bands based on dfferent measures are shown n Table 2. TABLE 2. SELECTED BANDS FOR DIFFERENT MEASURES 20

5 Crtera Selected Bands VAR-SE 182,118, 53,104 STD-SE 34,178, 41,104 MAD-SE 34,115, 51,100 The endmembers are extracted usng N-FINDR algorthm and ther spectral sgnatures are plotted n Fg. 5, Fg. 6, and Fg. 7. Fg. 7. Spectral Sgnatures of 4 endmembers usng MAD Fg. 5. Spectral Sgnatures of 4 endmembers usng VAR VIII. CONCLUSION The proposed method of dmensonalty reducton usng K-means clusterng provdes better band selecton. Further, the bands are selected usng K- means clusterng based on the varous measures such as VAR, STD and MAD. In proposed technque of band clusterng and selecton usng K-means method, band from each cluster s selected such that ntra-cluster varance s kept maxmum and nter-cluster varance s mnmum. The proposed technque s smple to mplement and computes the result very fast. The computaton takes seconds for band clusterng and selecton. The expermental results show that the endmembers are detected well and the spectral sgnatures of endmembers found usng N-FINDR algorthm are hghly matched wth the spectral sgnatures of endmembers shown n ground truth. Therefore t s concluded from the results of experments that the proposed clusterng technques are promsng and authentc technques for band clusterng and band selecton. IX. REFERENCES Fg. 6. Spectral Sgnatures of 4 endmembers usng STD [1] Muhammad Sohab, Ihsan-Ul-Haq, and Qaser Mushtaq, Dmensonal Reducton of Hyperspectral Image Data Usng Band Clusterng and Selecton Based on Statstcal Characterstcs of Band Images, Internatonal Journal of Computer and Communcaton Engneerng, Vol. 2, No. 2, March 2013 [2] Antono Plaza and Chen-I Chang, An Improved N-FINDR Algorthm n Implementaton, Algorthms and Technologes for Multspectral, Hyperspectral, and Ultraspectral Imagery XI, Proceedngs of SPIE Vol [3] Chen-I Chang, Senor Member, IEEE, and Su Wang, Student Member, IEEE, Constraned Band Selecton for Hyperspectral Imagery, IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 44, NO. 6, JUNE

6 [4] C.-I. Chang and S. Wang, Constraned band selecton for hyperspectral magery, IEEE Transactons on Geoscences And Remote Sensng, vol. 44, no. 6, pp , 2006 [5] C.-I. Chang. Hyperspectral Imagng: Technques for Spectral Detecton and Classfcaton. New York: Plenum, 2003 [10] R. Huang and M. He, Band selecton based feature weghtng for classfcaton of hyperspectral data, IEEE Geosc. Remote Sens. Lett., vol. 2, no. 2, pp , Apr [6] I. U. Haq and X. Xu A new approach to band clusterng and selecton for hyperspectral magery, IEEE ICSP Proceedngs [7] M.E. Wnter, N-FINDR: an algorthm for fast autonomous spectral endmember determnaton n hyperspectral data, Imagng Spectrometry V, Proc. SPIE 3753, pp , [8] M.E. Wnter, A proof of the N-FINDR algorthm for the automated detecton of endmembers n a hyperspectral mage, Algorthms and Technologes for Multspectral, Hyperspectral and Ultraspectral Imagery X, Proc. SPIE 5425, pp , [9] D. Henz and C.-I Chang, "Fully constraned least squares lnear mxture analyss for materal quantfcaton n hyperspectral magery," IEEE Trans. on Geoscence and Remote Sensng, vol. 39, no. 3, pp , March