论文标题
提示:高效零和少量概括的超网络指令调整
HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation
论文作者
论文摘要
最近的NLP模型表明,仅使用自然语言指导作为指导,有效地将“零击”有效地概括为新任务的非凡能力。但是,由于每个输入示例串联延长冗长的说明,因此许多这些方法遭受了高计算成本的影响,从而导致了指令的昂贵重新处理。为了避免这种情况,我们介绍了HyperNetworks以进行指导调整(提示),该工具将任务说明和示例转换为使用预审计的文本编码器插入基础模型的参数有效模块,从而消除了在模型输入中包含说明的需求。提示中的超网络还产生了一个编码的指令,在解码过程中,我们将其与编码输入相连,以进一步提高性能。在控制计算时,提示模型的表现优于强大的最新基准(以拖鞋的测量)。通过将指令转换为模块,提示模型可以有效地忽略指令的长度,而在计算用法方面进行了很少的示例输入。结果,提示可以通过合并其他几杆数据来提高其性能高达25%,同时仅利用多达5%的计算。这结合了参数有效的微调和内在学习的优势。
Recent NLP models have shown the remarkable ability to effectively generalise `zero-shot' to new tasks using only natural language instructions as guidance. However, many of these approaches suffer from high computational costs due to their reliance on concatenating lengthy instructions with every input example, resulting in costly reprocessing of the instruction. To avoid this, we introduce Hypernetworks for INstruction Tuning (HINT), which convert task instructions and examples into parameter-efficient modules inserted into an underlying model using a pretrained text encoder, eliminating the need to include instructions in the model input. The hypernetwork in HINT also produces an encoded instruction, which we concatenate with encoded inputs during decoding to further improve performance. HINT models outperform strong state-of-the-art baselines by over 10% when controlling for compute (measured in FLOPs). By converting instructions into modules, HINT models can effectively disregard the length of instructions and few-shot example inputs in terms of compute usage. As a result, HINT can enhance its performance by up to 25% by incorporating additional few-shot data, while utilizing only up to 5% more compute. This combines the strengths of parameter-efficient fine-tuning and in-context learning.