Elsevier

Pattern Recognition

Volume 41, Issue 3, March 2008, Pages 1012-1029
Pattern Recognition

Spectral clustering with eigenvector selection

https://doi.org/10.1016/j.patcog.2007.07.023Get rights and content

Abstract

The task of discovering natural groupings of input patterns, or clustering, is an important aspect of machine learning and pattern analysis. In this paper, we study the widely used spectral clustering algorithm which clusters data using eigenvectors of a similarity/affinity matrix derived from a data set. In particular, we aim to solve two critical issues in spectral clustering: (1) how to automatically determine the number of clusters, and (2) how to perform effective clustering given noisy and sparse data. An analysis of the characteristics of eigenspace is carried out which shows that (a) not every eigenvectors of a data affinity matrix is informative and relevant for clustering; (b) eigenvector selection is critical because using uninformative/irrelevant eigenvectors could lead to poor clustering results; and (c) the corresponding eigenvalues cannot be used for relevant eigenvector selection given a realistic data set. Motivated by the analysis, a novel spectral clustering algorithm is proposed which differs from previous approaches in that only informative/relevant eigenvectors are employed for determining the number of clusters and performing clustering. The key element of the proposed algorithm is a simple but effective relevance learning method which measures the relevance of an eigenvector according to how well it can separate the data set into different clusters. Our algorithm was evaluated using synthetic data sets as well as real-world data sets generated from two challenging visual learning problems. The results demonstrated that our algorithm is able to estimate the cluster number correctly and reveal natural grouping of the input data/patterns even given sparse and noisy data.

Introduction

The task of discovering natural groupings of input patterns, or clustering, is an important aspect of machine learning and pattern analysis. Clustering techniques are more and more frequently adopted by various research communities due to the increasing need of modelling large amount of data. As an unsupervised data analysis tool, clustering is desirable for modelling large date sets because the tedious and often inconsistent manual data labelling process can be avoided. The most popular clustering techniques are perhaps mixture models and K-means which are based on estimating explicit models of data distribution. Typically the distribution of a data set generated by a real-world system is complex and of an unknown shape, especially given the inevitable existence of noise. In this case, mixture models and K-means are expected to yield poor results since an explicit estimation of data distribution is difficult if even possible. Spectral clustering offers an attractive alternative which clusters data using eigenvectors of a similarity/affinity matrix derived from the original data set. In certain cases spectral clustering even becomes the only option. For instance, when different data points are represented using feature vectors of variable lengths, mixture models or K-means cannot be applied, while spectral clustering can still be employed as long as a pair-wise similarity measure can be defined for the data.

In spite of the extensive studies in the past on spectral clustering [1], [2], [3], [4], [5], [6], [7], [8], [9], two critical issues remain largely unresolved: (1) how to automatically determine the number of clusters, and (2) how to perform effective clustering given noisy and sparse data. Most previous work assumed that the number of clusters is known or has been manually set [1], [2], [5]. Recently researchers started to tackle the first issue, i.e. determining the cluster number automatically. Smyth [4] proposed to use a Monte Carlo cross validation approach to determine the number of clusters for sequences modelled using hidden Markov models (HMMs). This approach is computationally expensive and thus not suitable for large data sets common to applications such as image segmentation. Porikli and Haga [6] employed a validity score computed using the largest eigenvectors1 of a data affinity matrix to determine the number of clusters for video-based activity classification. Zelnik-Manor and Perona [7] proposed to determine the optimal cluster number through minimising the cost of aligning the top eigenvectors with a canonical coordinate system. The approaches in Refs. [6] and [7] are similar in that both of them are based on analysing the structures of the largest eigenvectors of a normalised data affinity matrix. In particular, assuming a number Km that is considered to be safely larger than the true number of clusters Ktrue, the top Km eigenvectors were exploited in both approaches to infer Ktrue. However, these approaches do not take into account the inevitable presence of noise in a realistic data set, i.e. they fail to address explicitly the second issue. They are thus error prone especially when the sample size is small.

We argue that the key to solving the two above-mentioned issues is to select the relevant eigenvectors which provide useful information about the natural grouping of data. To justify the need for eigenvector selection, we shall answer a couple of fundamental questions in spectral clustering. First, does every eigenvector provide useful information (therefore is needed) for clustering? It has been shown analytically that in an ‘ideal’ case in which all points in different clusters are infinitely far apart, the elements of the top Ktrue eigenvectors form clusters with distinctive gaps between them which can be readily used to separate data into different groups [5]. In other words, all top Ktrue eigenvectors are equally informative. However, theoretically it is not guaranteed that other top eigenvectors are equally informative even in the ‘ideal’ case. Figs. 1(f) and (g) suggest that, in a ‘close-to-ideal’ case, not all top eigenvectors are equally informative and useful for clustering. Now let us look at a realistic case where there exist noise and a fair amount of similarities between clusters. In this case, the distribution of elements of an eigenvector is far more complex. A general observation is that the gaps between clusters in the elements of the top eigenvectors are blurred and some eigenvectors, including those among the top Ktrue, are uninformative [5], [8], [9]. This is shown clearly in Fig. 1. Therefore, the answer to the first question is ‘no’ especially given a realistic data set. Second, is eigenvector selection necessary? It seems intuitive to include those less informative eigenvectors in the clustering process because, in principle, a clustering algorithm is expected to perform better given more information about the data grouping. However, in practice, the inclusion of uninformative eigenvectors can degrade the clustering process as demonstrated extensively later in the paper. This is hardly surprising because in a general context of pattern analysis, the importance of removing those noisy/uninformative features has long been recognised [10], [11]. The answer to the second question is thus ‘yes’. Given the answers to the above two questions, it becomes natural to consider performing eigenvector selection for spectral clustering. In this paper, we propose a novel relevant eigenvector selection algorithm and demonstrate that it indeed leads to more efficient and accurate estimation of the number of clusters and better clustering results compared to existing approaches. To our knowledge, this paper is the first to use eigenvector selection to improve spectral clustering results.

The rest of the paper is organised as follows. In Section 2, we first define the spectral clustering problem. An efficient and robust eigenvector selection algorithm is then introduced which measures the relevance of each eigenvector according to how well it can separate a data set into different clusters. Based on the eigenvector selection result, only the relevant eigenvectors will be used for a simultaneous cluster number estimation and data clustering based on a Gaussian mixture model (GMM) and the Bayesian information criterion (BIC). The effectiveness and robustness of our approach is demonstrated first in Section 2 using synthetic data sets, then in Sections 3 and 4 on solving two real-world visual pattern analysis problems. Specifically, in Section 3, the problem of image segmentation using spectral clustering is investigated. In Section 4, human behaviour captured on CCTV footage in a secured entrance surveillance scene is analysed for automated discovery of different types of behaviour patterns based on spectral clustering. Both synthetic and real data experiments presented in this paper show that our approach outperforms the approaches proposed in Refs. [6], [7]. The paper concludes in Section 5.

Section snippets

Spectral clustering with eigenvector relevance learning

Let us first formally define the spectral clustering problem. Given a set of N data points/input patterns represented using feature vectorsD={f1,,fn,,fN},we aim to discover the natural grouping of the input data. The optimal number of groups/clusters Ko is automatically determined to best describe the underlying distribution of the data set. We have Ko=Ktrue if it is estimated correctly. Note that different feature vectors can be of different dimensionalities. An N×N affinity matrix A={Aij}

Image segmentation

Our eigenvector selection based spectral clustering algorithm has been applied to image segmentation. A pixel-pixel pair-wise affinity matrix A is constructed for an image based on the intervening contours method introduced in Ref. [16]. First, for the ith pixel on the image the magnitude of the orientation energy along the dominant orientation is computed as OE(i) using oriented filter pairs. The local support area for the computation of OE(i) has a radius of 30. The value of OE(i) ranges

Video behaviour pattern clustering

Our spectral clustering algorithm has also been applied to solve the video based behaviour profiling problem in automated CCTV surveillance. Given 24/7 continuously recorded video or online CCTV input, the goal of automatic behaviour profiling is to learn a model that is capable of detecting unseen abnormal behaviour patterns whilst recognising novel instances of expected normal behaviour patterns. To achieve the goal, the natural grouping of behaviour patterns captured in a training data set

Discussion and conclusion

In this paper, we analysed and demonstrated that: (1) not every eigenvector of a data affinity matrix is informative and relevant for clustering; (2) eigenvector selection is critical because using uninformative/irrelevant eigenvectors could lead to poor clustering results; and (3) the corresponding eigenvalues cannot be used for relevant eigenvector selection given a realistic data set. Motivated by the analysis, a novel spectral clustering algorithm was proposed which differs from previous

About the Author—TAO XIANG is a Lecturer at the Department of Computer Science, Queen Mary, University of London. Dr Xiang was awarded his Ph.D. in Electrical and Computer Engineering from the National University of Singapore in 2002, which involved research into 3-D Computer Vision and Visual Perception. He also received his B.Sc. degree in Electrical Engineering from Xi’an Jiaotong University in 1995, and his M.Sc. degree in Electronic Engineering from the Communication University of China

References (27)

  • A. Blum et al.

    Selection of relevant features and examples in machine learning

    Artif. Intell.

    (1997)
  • Y. Weiss

    Segmentation using eigenvectors: a unifying view

  • J. Shi et al.

    Normalized cuts and image segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • S. Yu et al.

    Multiclass spectral clustering

  • P. Smyth

    Clustering sequence with hidden markov models

    Adv. Neural Inf. Process. Syst.

    (1997)
  • A. Ng et al.

    On spectral clustering: analysis and an algorithm

    Adv. Neural Inf. Process. Syst.

    (2001)
  • F. Porikli et al.

    Event detection by eigenvector decomposition using object and frame features

  • L. Zelnik-Manor, P. Perona, Self-tuning spectral clustering, Adv. Neural Inf. Process. Syst....
  • M. Fiedler

    Algebraic connectivity of graphs

    Czech. Math. J.

    (1973)
  • F. Chung, Number 92 in CBMS Regional Conference Series in Mathematics, American Mathematical Society, Providence, RI,...
  • J. Dy et al.

    Unsupervised feature selection applied to content-based retrival of lung images

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2003)
  • A. Dempster et al.

    Maximum-likelihood from incomplete data via the EM algorithm

    J. Roy. Statist. Soc. B

    (1977)
  • L.R. Rabiner

    A tutorial on hidden Markov models and selected applications in speech recognition

    Proc. IEEE

    (1989)
  • Cited by (159)

    View all citing articles on Scopus

    About the Author—TAO XIANG is a Lecturer at the Department of Computer Science, Queen Mary, University of London. Dr Xiang was awarded his Ph.D. in Electrical and Computer Engineering from the National University of Singapore in 2002, which involved research into 3-D Computer Vision and Visual Perception. He also received his B.Sc. degree in Electrical Engineering from Xi’an Jiaotong University in 1995, and his M.Sc. degree in Electronic Engineering from the Communication University of China (CUC) in 1998. His research interests include computer vision, image processing, statistical learning, pattern recognition, machine learning, and data mining. He has been working recently on topics such as spectral clustering, video based behaviour analysis and recognition, and model order selection for Dynamic Bayesian Networks.

    About the Author—SHAOGANG GONG is Professor of Visual Computation at the Department of Computer Science, Queen Mary, University of London and a Member of the UK Computing Research Committee. He heads the Queen Mary Computer Vision Group and has worked in computer vision and pattern recognition for over 20 years, published over 170 papers and a monograph. He twice won the Best Science Prize (1999, 2001) of British Machine Vision Conferences, the Best Paper Award (2001) of IEEE International Workshops on Recognition and Tracking of Faces and Gestures, and the Best Paper Award (2005) of IEE International Symposium on Imaging for Crime Detection and Prevention. He was a recipient of a Queen's Research Scientist Award (1987), a Royal Society Research Fellow (1987, 1988), a GEC-Oxford Fellow (1989), a visiting scientist at Microsoft Research (2001) and Samsung (2003).

    View full text