论文标题
Etri-Atrivity3D:用于机器人的大规模RGB-D数据集,以识别老年人的日常活动
ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly
论文作者
论文摘要
众所周知,基于许多现代算法运行的深度学习是渴望数据的。特别是,很难获得适合预期应用程序的数据集。为了应对这种情况,我们介绍了一个名为Etri-Activity3D的新数据集,重点介绍了机器人视图中老年人的日常活动。新数据集的主要特征如下:1)从对老年人的日常生活中仔细观察中选择的实际行动类别; 2)现实的数据收集,反映了机器人的工作环境和服务情况; 3)一个大规模数据集,该数据集克服了当前3D活动分析基准数据集的局限性。提出的数据集包含112,620个样本,包括RGB视频,深度图和骨架序列。在数据获取期间,要求100名受试者进行55个日常活动。此外,我们提出了一个名为四潮自适应CNN(FSA-CNN)的新型网络。提出的FSA-CNN具有三个主要特性:对时空变化的鲁棒性,输入自适应激活函数以及传统两流方法的扩展。在实验部分中,我们使用NTU RGB+D和ETRI-ACTIVITION 3D证实了所提出的FSA-CNN的优势。此外,两组年龄组之间的域差异得到了实验验证。最后,研究了FSA-CNN的扩展以处理多模式数据。
Deep learning, based on which many modern algorithms operate, is well known to be data-hungry. In particular, the datasets appropriate for the intended application are difficult to obtain. To cope with this situation, we introduce a new dataset called ETRI-Activity3D, focusing on the daily activities of the elderly in robot-view. The major characteristics of the new dataset are as follows: 1) practical action categories that are selected from the close observation of the daily lives of the elderly; 2) realistic data collection, which reflects the robot's working environment and service situations; and 3) a large-scale dataset that overcomes the limitations of the current 3D activity analysis benchmark datasets. The proposed dataset contains 112,620 samples including RGB videos, depth maps, and skeleton sequences. During the data acquisition, 100 subjects were asked to perform 55 daily activities. Additionally, we propose a novel network called four-stream adaptive CNN (FSA-CNN). The proposed FSA-CNN has three main properties: robustness to spatio-temporal variations, input-adaptive activation function, and extension of the conventional two-stream approach. In the experiment section, we confirmed the superiority of the proposed FSA-CNN using NTU RGB+D and ETRI-Activity3D. Further, the domain difference between both groups of age was verified experimentally. Finally, the extension of FSA-CNN to deal with the multimodal data was investigated.