语言模型分解：量化语言模型的依赖性和相关性

论文标题

语言模型分解：量化语言模型的依赖性和相关性

Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

论文作者

Zhang, Hao

论文摘要

预训练的语言模型（LMS），例如Bert（Devlin等人，2018年）及其变体，在过去几年中已经对各种NLP任务取得了重大改进。但是，研究他们的关系的理论框架仍然缺失。在本文中，我们通过研究预训练的LMS之间的线性依赖性来填补这一空白。 LMS的线性依赖性类似于向量的线性依赖性。我们建议使用其他LMS作为基础的线性组合表示语言模型分解（LMD），并得出封闭形式的解决方案。 LMD的拟合优度与确定系数相似，并用于测量一组LMS的线性依赖性。在实验中，我们发现Bert和11（11）Bert样LMS是91％线性依赖的。该观察结果表明，当前的最新（SOTA）LMS高度“相关”。为了进一步促进SOTA，我们需要更多样化和新颖的LM，这些LM较少依赖于现有的LMS。

Pre-trained language models (LMs), such as BERT (Devlin et al., 2018) and its variants, have led to significant improvements on various NLP tasks in past years. However, a theoretical framework for studying their relationships is still missing. In this paper, we fill this gap by investigating the linear dependency between pre-trained LMs. The linear dependency of LMs is defined analogously to the linear dependency of vectors. We propose Language Model Decomposition (LMD) to represent a LM using a linear combination of other LMs as basis, and derive the closed-form solution. A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs. In experiments, we find that BERT and eleven (11) BERT-like LMs are 91% linearly dependent. This observation suggests that current state-of-the-art (SOTA) LMs are highly "correlated". To further advance SOTA we need more diverse and novel LMs that are less dependent on existing LMs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题