Inteactive Exploation of Fuzzy Clustes using Neighbogams (joint wok with Michael R. Bethold and David E. Patteson) Bend Wiswedel Bend.Wiswedel@uni-konstanz.de 2.9.24 Bend Wiswedel, Univesity of Konstanz #1 Oveview Motivation: Neighbogams Neighbogam Constuction Clusteing Algoithm Results on Benchmak Data Sets Visualization and Inteaction Conclusion 2.9.24 Bend Wiswedel, Univesity of Konstanz #2 1
Motivation: Neighbogams One-dimensional epesentation of the neighbohoods Can be used to geneate potential cluste candidates Geedy algoithm picks best clustes one by one (supevised clusteing) Easy to visualize and theefoe suitable to inject expet knowledge into clusteing pocess 2.9.24 Bend Wiswedel, Univesity of Konstanz #3 Neighbogam Constuction Neighbogam fo Centoid inceasing distance to Centoid 2.9.24 Bend Wiswedel, Univesity of Konstanz #4 2
Neighbogam Constuction Neighbogam fo Centoid inceasing distance to Centoid 2.9.24 Bend Wiswedel, Univesity of Konstanz #5 Neighbogam Constuction Neighbogam fo Centoid inceasing distance to Centoid 2.9.24 Bend Wiswedel, Univesity of Konstanz #6 3
Neighbogam Constuction Neighbogam fo Centoid inceasing distance to Centoid 2.9.24 Bend Wiswedel, Univesity of Konstanz #7 Neighbogam Constuction Neighbogam fo Centoid inceasing distance to Centoid 2.9.24 Bend Wiswedel, Univesity of Konstanz #8 4
Neighbogam Constuction Neighbogam fo Centoid inceasing distance to Centoid 2.9.24 Bend Wiswedel, Univesity of Konstanz #9 Neighbogam Constuction 2.9.24 Bend Wiswedel, Univesity of Konstanz #1 5
Neighbogams ae one-dimensional mappings of the neighbohoods of paticula pattens (the centoids) built fo all pattens of inteest small o medium size data sets: all pattens lage data sets: pattens of a minoity class (example: high-thoughput-sceening in dug discovey) 2.9.24 Bend Wiswedel, Univesity of Konstanz #11 Neighbogam Cluste Puity fo Cluste: p ( ) # same _ class = # all _ classes ( ) ( ) 1 Fo us: given a puity, detemine thee adii to define cluste shape 2 3 1 2 (p=.7) 3 (p=.7) 2.9.24 Bend Wiswedel, Univesity of Konstanz #12 6
Membeship Functions Rectangula membeship Tapezoidal membeship µ µ 1 1 1 2 3 1 2 3 Tiangula membeship Gaussian membeship µ 1 µ 1 1 2 3 θ 1 2 3 2.9.24 Bend Wiswedel, Univesity of Konstanz #13 Neighbogam Cluste Puity fo Cluste: p ( ) # same _ class = # all _ classes ( ) ( ) 1 Fo us: given a puity, detemine thee adii to define cluste shape 2 3 1 2 (p=.7) 3 (p=.7) 2.9.24 Bend Wiswedel, Univesity of Konstanz #14 7
Neighbogam Cluste Puity fo Cluste: p ( ) # same _ class = # all _ classes ( ) ( ) 1 Fo us: given a puity, detemine thee adii to define cluste shape 3 2 1 2 (p=.5) 3 (p=.5) 2.9.24 Bend Wiswedel, Univesity of Konstanz #15 (Cisp) Clusteing Algoithm Each Neighbogam consideed as potential cluste candidate Geedy choice of best cluste The moe pattens ae coveed the bette Once cluste is chosen, its coveed pattens ae discaded, i.e. emoved fom consideation Loop while thee ae too many uncoveed pattens 2.9.24 Bend Wiswedel, Univesity of Konstanz #16 8
(Cisp) Clusteing Algoithm Best Cluste 2.9.24 Bend Wiswedel, Univesity of Konstanz #17 (Cisp) Clusteing Algoithm Best Cluste 2 nd best Cluste 2.9.24 Bend Wiswedel, Univesity of Konstanz #18 9
(Cisp) Clusteing Algoithm Best Cluste 2 nd best Cluste 3 d best Cluste 2.9.24 Bend Wiswedel, Univesity of Konstanz #19 (Cisp) Clusteing Algoithm Best Cluste 2 nd best Cluste 3 d best Cluste 4 th best Cluste 2.9.24 Bend Wiswedel, Univesity of Konstanz #2 1
(Cisp) Clusteing Algoithm Best Cluste 2 nd best Cluste 3 d best Cluste 4 th best Cluste 2.9.24 Bend Wiswedel, Univesity of Konstanz #21 (Fuzzy) Clusteing Algoithm Cluste coves pattens patly (accoding to membeship function) A patten can only be coveed to a maximal degee of 1. Cluste anking: cumulative degee of coveage 2.9.24 Bend Wiswedel, Univesity of Konstanz #22 11
Results on Benchmak Data - SatImage Data Set - 36 attibutes, 6 classes 4,435 taining pattens, 2, test pattens Puity p = 1. Membeship Function #cluste Eo [%] no class Pedicted [%] Eo [%] Majoity class PNN Eo [%] k-nn MLP c4.5 Rectangle 585 16.65 8.95 15.9 Tapezoidal Tiangula 585 1852 15.5 1.5 6.95 2.65 14.4 9.9 9.8 9.4 13.9 15. Gaussian 2695 8.1. 8.1 2.9.24 Bend Wiswedel, Univesity of Konstanz #23 Results on Benchmak Data - Othe Data Sets - Thee data sets fom the StatLog-Poject Best esults with Neighbogam Clusteing Algoithm usually with Gaussian membeship function Data set #dimensions #classes #pattens NG Eo [%] PNN k-nn Eo [%] MLP c4.5 SatImage 36 6 6,435 8.1 9.8 9.4 13.9 15. Diabetes 8 2 768 26.4 24.9 32.4 24.8 27. Segment 11 7 2,31 3.6 3.5 7.7 5.4 4. 2.9.24 Bend Wiswedel, Univesity of Konstanz #24 12
Visualization and Inteaction (NCI Data) 2.9.24 Bend Wiswedel, Univesity of Konstanz #25 Visualization and Inteaction (NCI Data) 2.9.24 Bend Wiswedel, Univesity of Konstanz #26 13
Visualization and Inteaction (NCI Data) 2.9.24 Bend Wiswedel, Univesity of Konstanz #27 Visualization and Inteaction (NCI Data) 2.9.24 Bend Wiswedel, Univesity of Konstanz #28 14
Conclusion Neighbogams as epesentation of high-dimensional data Applicable to model small o medium size data sets, o inteesting subsets of lage data sets Results of automatic clusteing compaable to state-of-the-at techniques Allows to inject domain knowledge though inteactive, visual clusteing 2.9.24 Bend Wiswedel, Univesity of Konstanz #29 15