用于点云上单个对象跟踪的3D暹罗变压器网络

论文标题

用于点云上单个对象跟踪的3D暹罗变压器网络

3D Siamese Transformer Network for Single Object Tracking on Point Clouds

论文作者

Hui, Le, Wang, Lingpeng, Tang, Linghua, Lan, Kaihao, Xie, Jin, Yang, Jian

论文摘要

基于暹罗网络的跟踪器将3D单一对象跟踪作为模板和搜索区域的点特征之间的互相关学习。由于在跟踪过程中模板和搜索区域之间的外观差异很大，因此如何学习它们之间的稳健跨相关性以识别搜索区域中的潜在目标仍然是一个具有挑战性的问题。在本文中，我们明确使用变压器形成一个3D暹罗变压器网络，以学习模板和点云的搜索区域之间的稳健互相关。具体来说，我们开发了一个暹罗点变压器网络，以了解目标的形状上下文信息。它的编码器使用自我注意力来捕获点云的非本地信息来表征对象的形状信息，而解码器则利用跨注意事项来提取歧视点特征。之后，我们开发了一个迭代的粗到加密相关网络，以了解模板与搜索区域之间的稳健跨相关性。它通过交叉注意将模板与搜索区域中的潜在目标联系起来，制定了交叉功能的增强。为了进一步增强潜在目标，它采用了自我功能增强，该增强功能将自我注意力应用于特征空间的本地K-NN图来汇总目标特征。 Kitti，Nuscenes和Waymo数据集的实验表明，我们的方法在3D单一对象跟踪任务上实现了最新的性能。

Siamese network based trackers formulate 3D single object tracking as cross-correlation learning between point features of a template and a search area. Due to the large appearance variation between the template and search area during tracking, how to learn the robust cross correlation between them for identifying the potential target in the search area is still a challenging problem. In this paper, we explicitly use Transformer to form a 3D Siamese Transformer network for learning robust cross correlation between the template and the search area of point clouds. Specifically, we develop a Siamese point Transformer network to learn shape context information of the target. Its encoder uses self-attention to capture non-local information of point clouds to characterize the shape information of the object, and the decoder utilizes cross-attention to upsample discriminative point features. After that, we develop an iterative coarse-to-fine correlation network to learn the robust cross correlation between the template and the search area. It formulates the cross-feature augmentation to associate the template with the potential target in the search area via cross attention. To further enhance the potential target, it employs the ego-feature augmentation that applies self-attention to the local k-NN graph of the feature space to aggregate target features. Experiments on the KITTI, nuScenes, and Waymo datasets show that our method achieves state-of-the-art performance on the 3D single object tracking task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题