论文标题
组成概括和自然语言变化:语义解析方法可以同时处理两者吗?
Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both?
论文作者
论文摘要
顺序到序列模型在处理自然语言变化方面表现出色,但已证明在分布外的概括方面遇到了困难。这促使新的专业体系结构具有更强的组成偏见,但是这些方法中的大多数仅在合成生成的数据集上进行了评估,这些数据集并非代表自然语言变化。在这项工作中,我们问:我们可以开发一种语义解析方法来处理自然语言变化和组成概括吗?为了更好地评估此功能,我们提出了新的火车和测试非合成数据集的分裂。我们证明,在广泛的评估中,强大的现有方法表现不佳。我们还提出了NQG-T5,这是一种混合模型,将基于高精度语法的方法与预训练的序列与序列模型相结合。它表现出在非合成数据上的几种组成概括挑战中的现有方法的表现,同时在标准评估方面也与最先进的方法竞争。我们的研究虽然还没有解决这个问题,但仍凸显了各种评估的重要性以及在语义解析中处理构图概括和自然语言变化的开放挑战。
Sequence-to-sequence models excel at handling natural language variation, but have been shown to struggle with out-of-distribution compositional generalization. This has motivated new specialized architectures with stronger compositional biases, but most of these approaches have only been evaluated on synthetically-generated datasets, which are not representative of natural language variation. In this work we ask: can we develop a semantic parsing approach that handles both natural language variation and compositional generalization? To better assess this capability, we propose new train and test splits of non-synthetic datasets. We demonstrate that strong existing approaches do not perform well across a broad set of evaluations. We also propose NQG-T5, a hybrid model that combines a high-precision grammar-based approach with a pre-trained sequence-to-sequence model. It outperforms existing approaches across several compositional generalization challenges on non-synthetic data, while also being competitive with the state-of-the-art on standard evaluations. While still far from solving this problem, our study highlights the importance of diverse evaluations and the open challenge of handling both compositional generalization and natural language variation in semantic parsing.