医疗信息提取工作台处理德国临床文本

论文标题

医疗信息提取工作台处理德国临床文本

A Medical Information Extraction Workbench to Process German Clinical Text

论文作者

Roller, Roland, Seiffe, Laura, Ayach, Ammer, Möller, Sebastian, Marten, Oliver, Mikhailov, Michael, Alt, Christoph, Schmidt, Danilo, Halleck, Fabian, Naik, Marcel, Duettmann, Wiebke, Budde, Klemens

论文摘要

背景：在信息提取和自然语言处理域中，可访问的数据集对于复制和比较结果至关重要。公开可用的实施和工具可以用作基准，并促进更复杂的应用程序的开发。但是，在临床文本处理的背景下，可访问数据集的数量很少 - 现有工具的数量也很少。主要原因之一是数据的敏感性。对于非英语语言，这个问题更为明显。方法：为了解决这种情况，我们介绍了一个工作台：德国临床文本处理模型的集合。这些模型接受了德国肾脏病报告的识别语料库的培训。结果：提出的模型为内域数据提供了有希望的结果。此外，我们表明我们的模型也可以成功应用于德语的其他生物医学文本。我们的工作台公开可用，因此可以开箱即用，或转移到相关问题上。

Background: In the information extraction and natural language processing domain, accessible datasets are crucial to reproduce and compare results. Publicly available implementations and tools can serve as benchmark and facilitate the development of more complex applications. However, in the context of clinical text processing the number of accessible datasets is scarce -- and so is the number of existing tools. One of the main reasons is the sensitivity of the data. This problem is even more evident for non-English languages. Approach: In order to address this situation, we introduce a workbench: a collection of German clinical text processing models. The models are trained on a de-identified corpus of German nephrology reports. Result: The presented models provide promising results on in-domain data. Moreover, we show that our models can be also successfully applied to other biomedical text in German. Our workbench is made publicly available so it can be used out of the box, as a benchmark or transferred to related problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题