Spectral clustering with eigenvector selection

doi:10.1016/j.patcog.2007.07.023

Pattern Recognition

Volume 41, Issue 3, March 2008, Pages 1012-1029

https://doi.org/10.1016/j.patcog.2007.07.023 Get rights and content

Abstract

The task of discovering natural groupings of input patterns, or clustering, is an important aspect of machine learning and pattern analysis. In this paper, we study the widely used spectral clustering algorithm which clusters data using eigenvectors of a similarity/affinity matrix derived from a data set. In particular, we aim to solve two critical issues in spectral clustering: (1) how to automatically determine the number of clusters, and (2) how to perform effective clustering given noisy and sparse data. An analysis of the characteristics of eigenspace is carried out which shows that (a) not every eigenvectors of a data affinity matrix is informative and relevant for clustering; (b) eigenvector selection is critical because using uninformative/irrelevant eigenvectors could lead to poor clustering results; and (c) the corresponding eigenvalues cannot be used for relevant eigenvector selection given a realistic data set. Motivated by the analysis, a novel spectral clustering algorithm is proposed which differs from previous approaches in that only informative/relevant eigenvectors are employed for determining the number of clusters and performing clustering. The key element of the proposed algorithm is a simple but effective relevance learning method which measures the relevance of an eigenvector according to how well it can separate the data set into different clusters. Our algorithm was evaluated using synthetic data sets as well as real-world data sets generated from two challenging visual learning problems. The results demonstrated that our algorithm is able to estimate the cluster number correctly and reveal natural grouping of the input data/patterns even given sparse and noisy data.

Introduction

The task of discovering natural groupings of input patterns, or clustering, is an important aspect of machine learning and pattern analysis. Clustering techniques are more and more frequently adopted by various research communities due to the increasing need of modelling large amount of data. As an unsupervised data analysis tool, clustering is desirable for modelling large date sets because the tedious and often inconsistent manual data labelling process can be avoided. The most popular clustering techniques are perhaps mixture models and K-means which are based on estimating explicit models of data distribution. Typically the distribution of a data set generated by a real-world system is complex and of an unknown shape, especially given the inevitable existence of noise. In this case, mixture models and K-means are expected to yield poor results since an explicit estimation of data distribution is difficult if even possible. Spectral clustering offers an attractive alternative which clusters data using eigenvectors of a similarity/affinity matrix derived from the original data set. In certain cases spectral clustering even becomes the only option. For instance, when different data points are represented using feature vectors of variable lengths, mixture models or K-means cannot be applied, while spectral clustering can still be employed as long as a pair-wise similarity measure can be defined for the data.

In spite of the extensive studies in the past on spectral clustering [1], [2], [3], [4], [5], [6], [7], [8], [9], two critical issues remain largely unresolved: (1) how to automatically determine the number of clusters, and (2) how to perform effective clustering given noisy and sparse data. Most previous work assumed that the number of clusters is known or has been manually set [1], [2], [5]. Recently researchers started to tackle the first issue, i.e. determining the cluster number automatically. Smyth [4] proposed to use a Monte Carlo cross validation approach to determine the number of clusters for sequences modelled using hidden Markov models (HMMs). This approach is computationally expensive and thus not suitable for large data sets common to applications such as image segmentation. Porikli and Haga [6] employed a validity score computed using the largest eigenvectors¹ of a data affinity matrix to determine the number of clusters for video-based activity classification. Zelnik-Manor and Perona [7] proposed to determine the optimal cluster number through minimising the cost of aligning the top eigenvectors with a canonical coordinate system. The approaches in Refs. [6] and [7] are similar in that both of them are based on analysing the structures of the largest eigenvectors of a normalised data affinity matrix. In particular, assuming a number $K_{m}$ that is considered to be safely larger than the true number of clusters $K_{true}$ , the top $K_{m}$ eigenvectors were exploited in both approaches to infer $K_{true}$ . However, these approaches do not take into account the inevitable presence of noise in a realistic data set, i.e. they fail to address explicitly the second issue. They are thus error prone especially when the sample size is small.

We argue that the key to solving the two above-mentioned issues is to select the relevant eigenvectors which provide useful information about the natural grouping of data. To justify the need for eigenvector selection, we shall answer a couple of fundamental questions in spectral clustering. First, does every eigenvector provide useful information (therefore is needed) for clustering? It has been shown analytically that in an ‘ideal’ case in which all points in different clusters are infinitely far apart, the elements of the top $K_{true}$ eigenvectors form clusters with distinctive gaps between them which can be readily used to separate data into different groups [5]. In other words, all top $K_{true}$ eigenvectors are equally informative. However, theoretically it is not guaranteed that other top eigenvectors are equally informative even in the ‘ideal’ case. Figs. 1(f) and (g) suggest that, in a ‘close-to-ideal’ case, not all top eigenvectors are equally informative and useful for clustering. Now let us look at a realistic case where there exist noise and a fair amount of similarities between clusters. In this case, the distribution of elements of an eigenvector is far more complex. A general observation is that the gaps between clusters in the elements of the top eigenvectors are blurred and some eigenvectors, including those among the top $K_{true}$ , are uninformative [5], [8], [9]. This is shown clearly in Fig. 1. Therefore, the answer to the first question is ‘no’ especially given a realistic data set. Second, is eigenvector selection necessary? It seems intuitive to include those less informative eigenvectors in the clustering process because, in principle, a clustering algorithm is expected to perform better given more information about the data grouping. However, in practice, the inclusion of uninformative eigenvectors can degrade the clustering process as demonstrated extensively later in the paper. This is hardly surprising because in a general context of pattern analysis, the importance of removing those noisy/uninformative features has long been recognised [10], [11]. The answer to the second question is thus ‘yes’. Given the answers to the above two questions, it becomes natural to consider performing eigenvector selection for spectral clustering. In this paper, we propose a novel relevant eigenvector selection algorithm and demonstrate that it indeed leads to more efficient and accurate estimation of the number of clusters and better clustering results compared to existing approaches. To our knowledge, this paper is the first to use eigenvector selection to improve spectral clustering results.

The rest of the paper is organised as follows. In Section 2, we first define the spectral clustering problem. An efficient and robust eigenvector selection algorithm is then introduced which measures the relevance of each eigenvector according to how well it can separate a data set into different clusters. Based on the eigenvector selection result, only the relevant eigenvectors will be used for a simultaneous cluster number estimation and data clustering based on a Gaussian mixture model (GMM) and the Bayesian information criterion (BIC). The effectiveness and robustness of our approach is demonstrated first in Section 2 using synthetic data sets, then in Sections 3 and 4 on solving two real-world visual pattern analysis problems. Specifically, in Section 3, the problem of image segmentation using spectral clustering is investigated. In Section 4, human behaviour captured on CCTV footage in a secured entrance surveillance scene is analysed for automated discovery of different types of behaviour patterns based on spectral clustering. Both synthetic and real data experiments presented in this paper show that our approach outperforms the approaches proposed in Refs. [6], [7]. The paper concludes in Section 5.

Section snippets

Spectral clustering with eigenvector relevance learning

Let us first formally define the spectral clustering problem. Given a set of N data points/input patterns represented using feature vectors $D = {f_{1}, \dots, f_{n}, \dots, f_{N}},$ we aim to discover the natural grouping of the input data. The optimal number of groups/clusters $K_{o}$ is automatically determined to best describe the underlying distribution of the data set. We have $K_{o} = K_{true}$ if it is estimated correctly. Note that different feature vectors can be of different dimensionalities. An $N \times N$ affinity matrix $A = {A_{ij}}$

Image segmentation

Our eigenvector selection based spectral clustering algorithm has been applied to image segmentation. A pixel-pixel pair-wise affinity matrix $A$ is constructed for an image based on the intervening contours method introduced in Ref. [16]. First, for the ith pixel on the image the magnitude of the orientation energy along the dominant orientation is computed as $OE (i)$ using oriented filter pairs. The local support area for the computation of $OE (i)$ has a radius of 30. The value of $OE (i)$ ranges

Video behaviour pattern clustering

Our spectral clustering algorithm has also been applied to solve the video based behaviour profiling problem in automated CCTV surveillance. Given 24/7 continuously recorded video or online CCTV input, the goal of automatic behaviour profiling is to learn a model that is capable of detecting unseen abnormal behaviour patterns whilst recognising novel instances of expected normal behaviour patterns. To achieve the goal, the natural grouping of behaviour patterns captured in a training data set

Discussion and conclusion

In this paper, we analysed and demonstrated that: (1) not every eigenvector of a data affinity matrix is informative and relevant for clustering; (2) eigenvector selection is critical because using uninformative/irrelevant eigenvectors could lead to poor clustering results; and (3) the corresponding eigenvalues cannot be used for relevant eigenvector selection given a realistic data set. Motivated by the analysis, a novel spectral clustering algorithm was proposed which differs from previous

References (27)

A. Blum et al.
Selection of relevant features and examples in machine learning
Artif. Intell.
(1997)
Y. Weiss
Segmentation using eigenvectors: a unifying view
J. Shi et al.
Normalized cuts and image segmentation
IEEE Trans. Pattern Anal. Mach. Intell.
(2000)
S. Yu et al.
Multiclass spectral clustering
P. Smyth
Clustering sequence with hidden markov models
Adv. Neural Inf. Process. Syst.
(1997)
A. Ng et al.
On spectral clustering: analysis and an algorithm
Adv. Neural Inf. Process. Syst.
(2001)
F. Porikli et al.
Event detection by eigenvector decomposition using object and frame features
L. Zelnik-Manor, P. Perona, Self-tuning spectral clustering, Adv. Neural Inf. Process. Syst....
M. Fiedler
Algebraic connectivity of graphs
Czech. Math. J.
(1973)
F. Chung, Number 92 in CBMS Regional Conference Series in Mathematics, American Mathematical Society, Providence, RI,...

J. Dy et al.

Unsupervised feature selection applied to content-based retrival of lung images

IEEE Trans. Pattern Anal. Mach. Intell.

(2003)

A. Dempster et al.

Maximum-likelihood from incomplete data via the EM algorithm

J. Roy. Statist. Soc. B

(1977)

L.R. Rabiner

A tutorial on hidden Markov models and selected applications in speech recognition

Proc. IEEE

(1989)

Cited by (159)

MDBSCAN: A multi-density DBSCAN based on relative density
2024, Neurocomputing
DBSCAN is a widely used clustering algorithm based on density metrics that can efficiently identify clusters with uniform density. However, if the densities of different clusters are varying, the corresponding clustering results may be not good. To address this issue, we propose a multi-density DBSCAN based on the relative density (MDBSCAN), which can achieve better results on clusters with multiple densities. The intuition of our work is simple but effective, we first divide the dataset into two parts: low density and high density, and then we take a divide and conquer method to deal with the respective parts to avoid them interfering with each other. Specifically, the proposed MDBSCAN consists of three steps: (i) extract the low-density data points in the dataset by relative density. (ii) find natural clusters among the identified low-density data points. (iii) clustering the remaining data points (except the data points of natural clusters in a dataset) by using DBSCAN and assigning the noises (generated by DBSCAN) to the nearest clusters. To verify the effectiveness of the proposed MDBSCAN algorithm, we conduct experiments on ten synthetic datasets and six real-world datasets. Experimental results demonstrate that the proposed MDBSCAN algorithm outperforms the original DBSCAN and six extends of DBSCAN, especially including two state-of-the-art algorithms (DRL-DBSCAN and AMD-DBSCAN) in most cases.
Determining the number of clusters, before finding clusters, from the susceptibility of the similarity matrix
2023, Physica A: Statistical Mechanics and its Applications
Clustering represents a fundamental procedure to provide users with meaningful insights from an original data set. The quality of the resulting clusters is largely dependent on the correct estimation of their number, $K^{*}$ , which must be provided as an input parameter in many clustering algorithms. Only very few techniques provide an automatic detection of $K^{*}$ and are usually based on cluster validity indexes which are expensive with regard to computation time. Here, we present a new algorithm which allows one to obtain an accurate estimate of $K^{*}$ , without partitioning data into the different clusters. This makes the algorithm particularly efficient in handling large-scale data sets from both the perspective of time and space complexity. The algorithm, indeed, highlights the block structure which is implicitly present in the similarity matrix, and associates $K^{*}$ to the number of blocks in the matrix. We test the algorithm on synthetic data sets with or without a hierarchical organization of elements. We explore a wide range of $K^{*}$ and show the effectiveness of the proposed algorithm to identify $K^{*}$ , even more accurate than existing methods based on standard internal validity indexes, with a huge advantage in terms of computation time and memory storage. We also discuss the application of the novel algorithm to the de-clustering of instrumental earthquake catalogs, a procedure finalized to identify the level of background seismic activity useful for seismic hazard assessment.
Distributed algorithms to determine eigenvectors of matrices on spatially distributed networks
2022, Signal Processing
Eigenvectors of matrices on a network have been used for understanding influence of a vertex and spectral clustering. For matrices with small geodesic-width and their given eigenvalues, we propose preconditioned gradient descent algorithms in this paper to find eigenvectors. We also consider synchronous implementation of the proposed algorithms at vertex/agent level in a spatially distributed network in which each agent has limited data processing capability and confined communication range.
A method for searching for a globally optimal k-partition of higher-dimensional datasets
2024, Journal of Global Optimization
Structure-Free Mendeleev Encodings of Material Compounds for Machine Learning
2023, Chemistry of Materials
Image segmentation by selecting eigenvectors based on extended information entropy
2023, IET Image Processing

View all citing articles on Scopus

About the Author—TAO XIANG is a Lecturer at the Department of Computer Science, Queen Mary, University of London. Dr Xiang was awarded his Ph.D. in Electrical and Computer Engineering from the National University of Singapore in 2002, which involved research into 3-D Computer Vision and Visual Perception. He also received his B.Sc. degree in Electrical Engineering from Xi’an Jiaotong University in 1995, and his M.Sc. degree in Electronic Engineering from the Communication University of China (CUC) in 1998. His research interests include computer vision, image processing, statistical learning, pattern recognition, machine learning, and data mining. He has been working recently on topics such as spectral clustering, video based behaviour analysis and recognition, and model order selection for Dynamic Bayesian Networks.

About the Author—SHAOGANG GONG is Professor of Visual Computation at the Department of Computer Science, Queen Mary, University of London and a Member of the UK Computing Research Committee. He heads the Queen Mary Computer Vision Group and has worked in computer vision and pattern recognition for over 20 years, published over 170 papers and a monograph. He twice won the Best Science Prize (1999, 2001) of British Machine Vision Conferences, the Best Paper Award (2001) of IEEE International Workshops on Recognition and Tracking of Faces and Gestures, and the Best Paper Award (2005) of IEE International Symposium on Imaging for Crime Detection and Prevention. He was a recipient of a Queen's Research Scientist Award (1987), a Royal Society Research Fellow (1987, 1988), a GEC-Oxford Fellow (1989), a visiting scientist at Microsoft Research (2001) and Samsung (2003).

View full text

Spectral clustering with eigenvector selection

Abstract

Introduction

Section snippets

Spectral clustering with eigenvector relevance learning

Image segmentation

Video behaviour pattern clustering

Discussion and conclusion

Artif. Intell.

Segmentation using eigenvectors: a unifying view

Normalized cuts and image segmentation

IEEE Trans. Pattern Anal. Mach. Intell.

Multiclass spectral clustering

Clustering sequence with hidden markov models

Adv. Neural Inf. Process. Syst.

On spectral clustering: analysis and an algorithm

Adv. Neural Inf. Process. Syst.

Event detection by eigenvector decomposition using object and frame features

Algebraic connectivity of graphs

Czech. Math. J.

Unsupervised feature selection applied to content-based retrival of lung images

IEEE Trans. Pattern Anal. Mach. Intell.

Maximum-likelihood from incomplete data via the EM algorithm

J. Roy. Statist. Soc. B

A tutorial on hidden Markov models and selected applications in speech recognition

Proc. IEEE