论文标题
通过Gromov-Wasserstein类型反馈进行数据聚类和可视化的数据正交化
Orthogonalization of data via Gromov-Wasserstein type feedback for clustering and visualization
论文作者
论文摘要
在本文中,我们提出了一种自适应方法,用于通过正交过程进行聚类和可视化数据。从使用扩散地图框架以马尔可夫过程表示的数据点开始,该方法通过应用受Gromov-Wasserstein距离启发的反馈机制来自适应提高簇的正交性。这种机制迭代地增加了光谱差距,并优化了数据的正交性,以实现具有高特异性的聚类。通过使用扩散图框架并使用过渡概率表示数据点之间的关系,该方法相对于基本距离,数据中的噪声和随机初始化都是可靠的。我们证明该方法将全球收敛到某些参数值的唯一固定点。我们还提出了一种相关的方法,其中要求马尔可夫过程中的过渡概率是双重随机的,在这种情况下,该方法对非convex优化问题产生了最小化。我们将方法应用于来自生物药物制造的冷冻电子显微镜图像数据,在那里我们可以确认与治疗功效有关的生物学相关见解。我们考虑了一个基因包装形态变化的示例,并确认该方法会产生与人类专家分类一致的生物学意义的聚类结果。
In this paper we propose an adaptive approach for clustering and visualization of data by an orthogonalization process. Starting with the data points being represented by a Markov process using the diffusion map framework, the method adaptively increase the orthogonality of the clusters by applying a feedback mechanism inspired by the Gromov-Wasserstein distance. This mechanism iteratively increases the spectral gap and refines the orthogonality of the data to achieve a clustering with high specificity. By using the diffusion map framework and representing the relation between data points using transition probabilities, the method is robust with respect to both the underlying distance, noise in the data and random initialization. We prove that the method converges globally to a unique fixpoint for certain parameter values. We also propose a related approach where the transition probabilities in the Markov process are required to be doubly stochastic, in which case the method generates a minimizer to a nonconvex optimization problem. We apply the method on cryo-electron microscopy image data from biopharmaceutical manufacturing where we can confirm biologically relevant insights related to therapeutic efficacy. We consider an example with morphological variations of gene packaging and confirm that the method produces biologically meaningful clustering results consistent with human expert classification.