结合视觉和文本特征，用于历史报纸的语义分割

论文标题

结合视觉和文本特征，用于历史报纸的语义分割

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

论文作者

Barman, Raphaël, Ehrmann, Maud, Clematide, Simon, Oliveira, Sofia Ares, Kaplan, Frédéric

论文摘要

在过去的几十年中获得的大量数字化历史文件自然而然地用于自动处理和探索。寻求自动处理传真并提取信息的研究工作正在与文档布局分析作为第一步。如果通过深度学习技术在过去几年中对文档图像中感兴趣的细分市场的识别和分类取得了重大进展，那么除其他挑战外，还存在着较细粒度的细分类型的使用以及考虑复杂，异构文档（如历史报纸）的考虑。此外，大多数方法仅考虑视觉特征，忽略文本信号。在这种情况下，我们引入了一种多模式方法，用于结合视觉和文本特征的历史报纸的语义分割。基于一系列关于瑞士和卢森堡报纸的实验，我们研究了视觉和文本特征的预测能力及其在跨时间和来源概括的能力。结果表明，与强视觉基线相比，多模型模型的一致性以及对高材料方差的鲁棒性相比。

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题