论文标题
使用不完整的cholesky分解的快速内核K-均值聚类
Fast Kernel k-means Clustering Using Incomplete Cholesky Factorization
论文作者
论文摘要
基于内核的聚类算法可以识别和捕获数据集中的非线性结构,从而比线性聚类可以实现更好的性能。但是,计算和存储整个内核矩阵占据很大的内存,以至于基于内核的聚类很难处理大型数据集。在本文中,我们采用不完整的Cholesky分解来加速内核聚类并节省内存空间。使用不完整的Cholesky分解的提议的内核$ K $ -MEANS聚类的关键思想是,我们通过低级别矩阵及其转置的乘积近似整个内核矩阵。然后将线性$ k $ -means聚类应用于低级矩阵的转置列。我们在分析和经验上均表明,所提出的算法的性能类似于内核$ k $ -MEANS-MEANS聚类算法的性能,但是我们的方法可以处理大型数据集。
Kernel-based clustering algorithm can identify and capture the non-linear structure in datasets, and thereby it can achieve better performance than linear clustering. However, computing and storing the entire kernel matrix occupy so large memory that it is difficult for kernel-based clustering to deal with large-scale datasets. In this paper, we employ incomplete Cholesky factorization to accelerate kernel clustering and save memory space. The key idea of the proposed kernel $k$-means clustering using incomplete Cholesky factorization is that we approximate the entire kernel matrix by the product of a low-rank matrix and its transposition. Then linear $k$-means clustering is applied to columns of the transpose of the low-rank matrix. We show both analytically and empirically that the performance of the proposed algorithm is similar to that of the kernel $k$-means clustering algorithm, but our method can deal with large-scale datasets.