基准对各种病理数据集的自我监督学习

论文标题

基准对各种病理数据集的自我监督学习

Benchmarking Self-Supervised Learning on Diverse Pathology Datasets

论文作者

Kang, Mingu, Song, Heon, Park, Seonwook, Yoo, Donggeun, Pereira, Sérgio

论文摘要

计算病理学可以导致挽救人类的生命，但是模型是饥饿的注释，众所周知，病理学图像昂贵。自我监督的学习已证明是使用未标记数据的有效方法，其在病理学上的应用可以极大地使其下游任务受益。然而，尚无原则研究来比较SSL方法并讨论如何适应病理学。为了满足这一需求，我们迄今为止对病理图像数据进行SSL预训练的最大规模研究。我们的研究使用4种有关下游任务的代表性SSL方法进行。我们确定在病理学中与大规模结构域一致的预训练在标准SSL设置（如线性和微调评估）以及低标签方案中始终超过表现图像训练。此外，我们提出了一组特定领域的技术，我们在实验上显示出导致性能提升。最后，我们首次将SSL应用于核实例分割的挑战性任务，并在不同的设置下显示出较大且一致的性能改进。

Computational pathology can lead to saving human lives, but models are annotation hungry and pathology images are notoriously expensive to annotate. Self-supervised learning has shown to be an effective method for utilizing unlabeled data, and its application to pathology could greatly benefit its downstream tasks. Yet, there are no principled studies that compare SSL methods and discuss how to adapt them for pathology. To address this need, we execute the largest-scale study of SSL pre-training on pathology image data, to date. Our study is conducted using 4 representative SSL methods on diverse downstream tasks. We establish that large-scale domain-aligned pre-training in pathology consistently out-performs ImageNet pre-training in standard SSL settings such as linear and fine-tuning evaluations, as well as in low-label regimes. Moreover, we propose a set of domain-specific techniques that we experimentally show leads to a performance boost. Lastly, for the first time, we apply SSL to the challenging task of nuclei instance segmentation and show large and consistent performance improvements under diverse settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题