论文标题

计划优化低资源语言家庭的双语词典归纳

Plan Optimization to Bilingual Dictionary Induction for Low-Resource Language Families

论文作者

Nasution, Arbi Haza, Murakami, Yohei, Ishida, Toru

论文摘要

创建双语词典是丰富低资源语言的第一步。特别是对于密切相关的方法,已经表明,基于约束的方法可用于通过枢轴语言从两个双语词典中诱导双语词典。但是,如果没有可用的机器可读词典作为输入,我们需要考虑双语母语者的手动创建。为了实现一个全面创建多个双语词典的目标,即使我们已经有几个现有的机器可读双语词典,仍然很难确定基于约束的方法的执行顺序来减少总成本。计划优化对于通过考虑方法及其成本来组成双语词典创建的顺序至关重要。我们通过利用马尔可夫决策过程(MDP)来制定双语词典的计划优化,以便更准确地估算最可行的最佳计划,并在完全实施基于约束基于约束的双语词典诱导之前,最少的总成本。我们将双语词典诱导精度的先前Beta分布与拓扑的语言相似性和多义构图为$α$和$β$参数。它进一步用于建模成本函数和状态过渡概率。我们将所有投资计划的成本估计为评估拟议的基于MDP的方法的基准,并将总成本作为评估指标。在第一批实验中利用后β分布来在第二批实验中构建先前的β分布,与估计的所有投资计划相比,与估计的MDP最佳计划相比,结果显示出61.5 \%的成本降低和成本降低的39.4%。基于MDP的提案的总成本优于基线。

Creating bilingual dictionary is the first crucial step in enriching low-resource languages. Especially for the closely-related ones, it has been shown that the constraint-based approach is useful for inducing bilingual lexicons from two bilingual dictionaries via the pivot language. However, if there are no available machine-readable dictionaries as input, we need to consider manual creation by bilingual native speakers. To reach a goal of comprehensively create multiple bilingual dictionaries, even if we already have several existing machine-readable bilingual dictionaries, it is still difficult to determine the execution order of the constraint-based approach to reducing the total cost. Plan optimization is crucial in composing the order of bilingual dictionaries creation with the consideration of the methods and their costs. We formalize the plan optimization for creating bilingual dictionaries by utilizing Markov Decision Process (MDP) with the goal to get a more accurate estimation of the most feasible optimal plan with the least total cost before fully implementing the constraint-based bilingual lexicon induction. We model a prior beta distribution of bilingual lexicon induction precision with language similarity and polysemy of the topology as $α$ and $β$ parameters. It is further used to model cost function and state transition probability. We estimated the cost of all investment plan as a baseline for evaluating the proposed MDP-based approach with total cost as an evaluation metric. After utilizing the posterior beta distribution in the first batch of experiments to construct the prior beta distribution in the second batch of experiments, the result shows 61.5\% of cost reduction compared to the estimated all investment plan and 39.4\% of cost reduction compared to the estimated MDP optimal plan. The MDP-based proposal outperformed the baseline on the total cost.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源