在话语解析中重要的变化：估计域转移对解析器错误的影响

论文标题

在话语解析中重要的变化：估计域转移对解析器错误的影响

The Change that Matters in Discourse Parsing: Estimating the Impact of Domain Shift on Parser Error

论文作者

Atwell, Katherine, Sicilia, Anthony, Hwang, Seong Jae, Alikhani, Malihe

论文摘要

话语分析使我们能够得出超出句子级别的文本文档的推断。话语模型的当前性能在培训分配覆盖范围之外的文本上非常低，从而减少了现有模型的实际效用。需要一项措施，可以告诉我们我们的模型在何种程度上从培训到测试样本的何时可以从不同的分布中得出这些样本。虽然可以通过分配转移来估算这一点，但我们认为这与观察到的分类器的误差（即错误差距）的变化直接相关。因此，我们建议使用理论域适应文献中的统计量，该文献可以直接与错误差距有关。我们研究了该统计数据的偏见是理论上以及通过对来自域的6个话语数据集上的2400多个实验的大规模实证研究的估计量，包括但不限于：新闻，生物医学文本，TED谈话，Reddit帖子和小说。我们的结果不仅激发了我们的建议，并帮助我们了解其局限性，而且还提供了对话语模型和数据集的属性的见解，从而改善了域适应性的性能。例如，我们发现，当培训和测试集大不相同时，非新闻数据集比新闻数据集更容易传输到新闻数据集。我们的代码和相关的Python软件包可以允许从业者做出更明智的模型和数据集选择。

Discourse analysis allows us to attain inferences of a text document that extend beyond the sentence-level. The current performance of discourse models is very low on texts outside of the training distribution's coverage, diminishing the practical utility of existing models. There is need for a measure that can inform us to what extent our model generalizes from the training to the test sample when these samples may be drawn from distinct distributions. While this can be estimated via distribution shift, we argue that this does not directly correlate with change in the observed error of a classifier (i.e. error-gap). Thus, we propose to use a statistic from the theoretical domain adaptation literature which can be directly tied to error-gap. We study the bias of this statistic as an estimator of error-gap both theoretically and through a large-scale empirical study of over 2400 experiments on 6 discourse datasets from domains including, but not limited to: news, biomedical texts, TED talks, Reddit posts, and fiction. Our results not only motivate our proposal and help us to understand its limitations, but also provide insight on the properties of discourse models and datasets which improve performance in domain adaptation. For instance, we find that non-news datasets are slightly easier to transfer to than news datasets when the training and test sets are very different. Our code and an associated Python package are available to allow practitioners to make more informed model and dataset choices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题