论文标题

在实用性占用度量下达到序列效用最大化

Towards Sequence Utility Maximization under Utility Occupancy Measure

论文作者

Huang, Gengsen, Gan, Wensheng, Yu, Philip S.

论文摘要

发现公用事业驱动的模式是一个有用且困难的研究主题。它可以从特定和多样化的数据库中提取重要而有趣的信息,从而增加提供的服务的价值。实际上,效用的度量通常用于证明对象或模式的重要性,利润或风险。在数据库中,尽管实用程序是每种模式的灵活标准,但由于忽略了效用共享,这是一个更加绝对的标准。这导致派生模式仅探索数据库的部分和本地知识。公用事业占用是一个最近提出的模型,它考虑了高实用性但占用率较低的采矿问题。但是,现有的研究集中在没有揭示物体发生时间关系的项目集上。因此,本文涉及序列效用最大化。我们首先在序列数据上定义了效用占用率,并提出了高效用占用顺序模式挖掘(HUOSPM)的问题。在HuOSPM中对包括频率,效用和占用率在内的三个维度进行了全面评估。提出了一种称为序列效用最大化的算法,并提出了实用性占用度量(SUMU)。此外,两个数据结构用于存储有关模式,公用事业 - 占用链链(UOL链)和六个相关上限的公用事业链链(UOL-链)和公用事业 - 占用式(UO-table)的数据结构,旨在提高效率。进行经验实验以评估新算法的效率和有效性。分析和讨论了不同上限和修剪策略的影响。全面的结果表明,我们算法的工作是聪明和有效的。

The discovery of utility-driven patterns is a useful and difficult research topic. It can extract significant and interesting information from specific and varied databases, increasing the value of the services provided. In practice, the measure of utility is often used to demonstrate the importance, profit, or risk of an object or a pattern. In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to the neglect of utility sharing. This leads to the derived patterns only exploring partial and local knowledge from a database. Utility occupancy is a recently proposed model that considers the problem of mining with high utility but low occupancy. However, existing studies are concentrated on itemsets that do not reveal the temporal relationship of object occurrences. Therefore, this paper towards sequence utility maximization. We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining (HUOSPM). Three dimensions, including frequency, utility, and occupancy, are comprehensively evaluated in HUOSPM. An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed. Furthermore, two data structures for storing related information about a pattern, Utility-Occupancy-List-Chain (UOL-Chain) and Utility-Occupancy-Table (UO-Table) with six associated upper bounds, are designed to improve efficiency. Empirical experiments are carried out to evaluate the novel algorithm's efficiency and effectiveness. The influence of different upper bounds and pruning strategies is analyzed and discussed. The comprehensive results suggest that the work of our algorithm is intelligent and effective.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源