0
点赞
收藏
分享

微信扫一扫

论文阅读 [TPAMI-2022] Tensor Representations for Action Recognition

论文阅读 [TPAMI-2022] Tensor Representations for Action Recognition

论文搜索(studyai.com)

搜索论文: Tensor Representations for Action Recognition

搜索论文: http://www.studyai.com/search/whole-site/?q=Tensor+Representations+for+Action+Recognition

关键字(Keywords)

Tensors; Kernel; Three-dimensional displays; Skeleton; Correlation; Optical imaging; Higher order statistics; CNN; 3D skeletons; action recognition; aggregation; kernels; higher-order tensors; HOSVD; power normalization

机器视觉

细粒度视觉; 动作识别; 三维人体; 时间与空间; 多模态感知

摘要(Abstract)

Human actions in video sequences are characterized by the complex interplay between spatial features and their temporal dynamics.

视频序列中的人类行为的特点是空间特征及其时间动态之间的复杂相互作用。.

In this paper, we propose novel tensor representations for compactly capturing such higher-order relationships between visual features for the task of action recognition.

在本文中,我们提出了一种新的张量表示法,用于在动作识别任务中紧凑地捕捉视觉特征之间的高阶关系。.

We propose two tensor-based feature representations, viz.

我们提出了两种基于张量的特征表示,即。.

(i) sequence compatibility kernel (SCK) and (ii) dynamics compatibility kernel (DCK).

(i) 序列兼容性内核(SCK)和(ii)动态兼容性内核(DCK)。.

SCK builds on the spatio-temporal correlations between features, whereas DCK explicitly models the action dynamics of a sequence.

SCK建立在特征之间的时空相关性之上,而DCK明确地建模了序列的动作动力学。.

We also explore generalization of SCK, coined SCK    ⊕ \;\oplus ⊕, that operates on subsequences to capture the local-global interplay of correlations, which can incorporate multi-modal inputs e.g., skeleton 3D body-joints and per-frame classifier scores obtained from deep learning models trained on videos.

我们还探讨了SCK的泛化,即SCKKaTeX parse error: Undefined control sequence: \; at position 1: \̲;̲\奥普拉斯⊕, 它对子序列进行操作,以捕获相关性的局部-全局相互作用,可以结合多模态输入,例如骨骼3D身体关节和从视频训练的深度学习模型中获得的每帧分类器分数。.

We introduce linearization of these kernels that lead to compact and fast descriptors.

我们引入这些核的线性化,从而得到紧凑而快速的描述符。.

We provide experiments on (i) 3D skeleton action sequences, (ii) fine-grained video sequences, and (iii) standard non-fine-grained videos.

我们提供了(i)3D骨架动作序列,(ii)细粒度视频序列,以及(iii)标准非细粒度视频的实验。.

As our final representations are tensors that capture higher-order relationships of features, they relate to co-occurrences for robust fine-grained recognition (Lin, 2017), (Koniusz, 2018).

由于我们的最终表示是捕捉特征高阶关系的张量,因此它们与稳健细粒度识别的共现相关(Lin,2017),(Koniusz,2018)。.

We use higher-order tensors and so-called Eigenvalue Power Normalization (EPN) which have been long speculated to perform spectral detection of higher-order occurrences (Koniusz, 2013), (Koniusz, 2017), thus detecting fine-grained relationships of features rather than merely count features in action sequences.

我们使用高阶张量和所谓的特征值功率归一化(EPN),长期以来,人们一直认为EPN可以对高阶事件进行光谱检测(Koniusz,2013),(Koniusz,2017),从而检测特征的细粒度关系,而不仅仅是对动作序列中的特征进行计数。.

We prove that a tensor of order r r rr, built from Z ∗ Z_* ZZ* dimensional features, coupled with EPN indeed detects if at least one higher-order occurrence is ‘projected’ into one of its ( Z ∗ r ) \binom{Z_*}{r} (rZ)Z*r subspaces of dim.

我们证明了由 Z ∗ Z* ZZ维特征构建的 r r rr阶张量,加上EPN,确实可以检测到至少一个高阶事件是否被“投影”到dim的 ( Z ∗ r ) \binom{Z*}{r} (rZ)Zr子空间中。.

r r rr represented by the tensor, thus forming a Tensor Power Normalization metric endowed with ( Z ∗ r ) \binom{Z_*}{r} (rZ)Z*r such ‘detectors’…

由张量表示的 r r rr,因此形成了一个张量幂归一化度量,赋予 ( Z ∗ r ) \binom{Z_*}{r} (rZ)Z*r这样的“检测器”。。.

作者(Authors)

[‘Piotr Koniusz’, ‘Lei Wang’, ‘Anoop Cherian’]

举报

相关推荐

0 条评论