论文标题
高分辨率图像识别的迭代补丁选择
Iterative Patch Selection for High-Resolution Image Recognition
论文作者
论文摘要
高分辨率图像在各种应用中都普遍存在,例如自动驾驶和计算机辅助诊断。但是,此类图像上的训练神经网络在计算上具有挑战性,即使在现代GPU上也很容易导致不可存储的错误。我们提出了一种简单的方法,即迭代补丁选择(IPS),该方法将内存使用量与输入大小分解,从而可以在紧密的硬件约束下处理任意大图像。 IPS通过仅选择最显着的补丁来实现这一目标,然后将其汇总为图像识别的全局表示形式。对于补丁选择和聚合,引入了基于跨注意的变压器,该变压器与多个实例学习的联系非常紧密。我们的方法表现出强大的性能,并且在使用最小加速器内存的同时,在不同的域,训练方案和图像大小之间具有广泛的适用性。例如,我们能够在最多250k补丁(> 16 Gigapixels)组成的全扫描图像上对我们的模型进行尺寸,只有5 GB的GPU VRAM,批量大小为16。
High-resolution images are prevalent in various applications, such as autonomous driving and computer-aided diagnosis. However, training neural networks on such images is computationally challenging and easily leads to out-of-memory errors even on modern GPUs. We propose a simple method, Iterative Patch Selection (IPS), which decouples the memory usage from the input size and thus enables the processing of arbitrarily large images under tight hardware constraints. IPS achieves this by selecting only the most salient patches, which are then aggregated into a global representation for image recognition. For both patch selection and aggregation, a cross-attention based transformer is introduced, which exhibits a close connection to Multiple Instance Learning. Our method demonstrates strong performance and has wide applicability across different domains, training regimes and image sizes while using minimal accelerator memory. For example, we are able to finetune our model on whole-slide images consisting of up to 250k patches (>16 gigapixels) with only 5 GB of GPU VRAM at a batch size of 16.