论文标题
选择LDA模型中的主题数量 - 选择标准的蒙特卡洛比较
Choosing the Number of Topics in LDA Models -- A Monte Carlo Comparison of Selection Criteria
论文作者
论文摘要
选择LDA模型中的主题数量被认为是一项艰巨的任务,已经提出了替代方法。评估了最近开发的单数贝叶斯信息标准(SBIC)的性能,并将其与替代模型选择标准的性能进行了比较。 SBIC是标准BIC的概括,可以针对单数统计模型实施。该比较基于蒙特卡洛模拟,并针对多种替代设置进行了比较,这些设置的数量,文档数量和文档中的文档规模有所不同。使用不同的标准来衡量性能,这些标准考虑了正确数量的主题,以及是否确定了DGP的相关主题。得出了应用程序中LDA模型选择的实用建议。
Selecting the number of topics in LDA models is considered to be a difficult task, for which alternative approaches have been proposed. The performance of the recently developed singular Bayesian information criterion (sBIC) is evaluated and compared to the performance of alternative model selection criteria. The sBIC is a generalization of the standard BIC that can be implemented to singular statistical models. The comparison is based on Monte Carlo simulations and carried out for several alternative settings, varying with respect to the number of topics, the number of documents and the size of documents in the corpora. Performance is measured using different criteria which take into account the correct number of topics, but also whether the relevant topics from the DGPs are identified. Practical recommendations for LDA model selection in applications are derived.