论文标题
从大量的临床注释中开发通用临床语言推断模型
Developing a general-purpose clinical language inference model from a large corpus of clinical notes
论文作者
论文摘要
已经为临床语言推断开发了几种生物医学模型。但是,这些模型通常使用一般词汇,并接受了相对较小的临床语料库的培训。我们试图评估使用特定领域的词汇和大型临床培训语料库对这些语言模型在临床语言推论中表现的影响。我们使用了在加利福尼亚大学旧金山大学(UCSF)撰写的多样化的,已有7500万个去识别的临床注释,培训了来自变形金刚(BERT)模型的双向编码器解码器。我们对几种临床语言推理基准任务进行了评估:临床和时间概念识别,关系提取和医学语言推断。我们还使用UCSF的放电摘要(诊断代码分配和治疗类别的推论)评估了两个任务的模型。我们的模型在公共基准任务上的最佳公开生物医学语言模型表现出色,并且在使用UCSF数据的两个任务的内部评估中,它们比这些模型要好得多。内域词汇的使用似乎改善了更长的文档的编码。大型临床语料库的使用似乎增强了编码和推断准确性的文档。但是,需要进一步的研究来改善缩写分辨率,以及数值,时间和隐含的因果推论。
Several biomedical language models have already been developed for clinical language inference. However, these models typically utilize general vocabularies and are trained on relatively small clinical corpora. We sought to evaluate the impact of using a domain-specific vocabulary and a large clinical training corpus on the performance of these language models in clinical language inference. We trained a Bidirectional Encoder Decoder from Transformers (BERT) model using a diverse, deidentified corpus of 75 million deidentified clinical notes authored at the University of California, San Francisco (UCSF). We evaluated this model on several clinical language inference benchmark tasks: clinical and temporal concept recognition, relation extraction and medical language inference. We also evaluated our model on two tasks using discharge summaries from UCSF: diagnostic code assignment and therapeutic class inference. Our model performs at par with the best publicly available biomedical language models of comparable sizes on the public benchmark tasks, and is significantly better than these models in a within-system evaluation on the two tasks using UCSF data. The use of in-domain vocabulary appears to improve the encoding of longer documents. The use of large clinical corpora appears to enhance document encoding and inferential accuracy. However, further research is needed to improve abbreviation resolution, and numerical, temporal, and implicitly causal inference.