向AI转型的程序员都关注了这个号????????????
机器学习AI算法工程 公众号:datayx
【CVPR 2022 论文开源目录】
Backbone
CLIP
GAN
NAS
NeRF
Visual Transformer
视觉和语言(Vision-Language)
自监督学习(Self-supervised Learning)
数据增强(Data Augmentation)
目标检测(Object Detection)
目标跟踪(Visual Tracking)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
小样本分割(Few-Shot Segmentation)
视频理解(Video Understanding)
图像编辑(Image Editing)
Low-level Vision
超分辨率(Super-Resolution)
3D点云(3D Point Cloud)
3D目标检测(3D Object Detection)
3D语义分割(3D Semantic Segmentation)
3D目标跟踪(3D Object Tracking)
3D人体姿态估计(3D Human Pose Estimation)
3D语义场景补全(3D Semantic Scene Completion)
3D重建(3D Reconstruction)
伪装物体检测(Camouflaged Object Detection)
深度估计(Depth Estimation)
立体匹配(Stereo Matching)
车道线检测(Lane Detection)
图像修复(Image Inpainting)
人群计数(Crowd Counting)
医学图像(Medical Image)
场景图生成(Scene Graph Generation)
弱监督物体检测(Weakly Supervised Object Localization)
高光谱图像重建(Hyperspectral Image Reconstruction)
水印(Watermarking)
数据集(Datasets)
新任务(New Tasks)
其他(Others)
Backbone
A ConvNet for the 2020s
Paper: https://arxiv.org/abs/2201.03545
Code: https://github.com/facebookresearch/ConvNeXt
中文解读:https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Paper: https://arxiv.org/abs/2203.06717
Code: https://github.com/megvii-research/RepLKNet
Code2: https://github.com/DingXiaoH/RepLKNet-pytorch
中文解读:https://mp.weixin.qq.com/s/_qXyIQut-JRW6VvsjaQlFg
MPViT : Multi-Path Vision Transformer for Dense Prediction
Paper: https://arxiv.org/abs/2112.11010
Code: https://github.com/youngwanLEE/MPViT
中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg
CLIP
HairCLIP: Design Your Hair by Text and Reference Image
Paper: https://arxiv.org/abs/2112.05142
Code: https://github.com/wty-ustc/HairCLIP
PointCLIP: Point Cloud Understanding by CLIP
Paper: https://arxiv.org/abs/2112.02413
Code: https://github.com/ZrrSkywalker/PointCLIP
Blended Diffusion for Text-driven Editing of Natural Images
Paper: https://arxiv.org/abs/2111.14818
Code: https://github.com/omriav/blended-diffusion
GAN
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Homepage: https://semanticstylegan.github.io/
Paper: https://arxiv.org/abs/2112.02236
Demo: https://semanticstylegan.github.io/videos/demo.mp4
Style Transformer for Image Inversion and Editing
Paper: https://arxiv.org/abs/2203.07932
Code: https://github.com/sapphire497/style-transformer
NAS
β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
Paper: https://arxiv.org/abs/2203.01665
Code: https://github.com/Sunshine-Ye/Beta-DARTS
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
Paper: https://arxiv.org/abs/2111.15362
Code: None
NeRF
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
Homepage: https://jonbarron.info/mipnerf360/
Paper: https://arxiv.org/abs/2111.12077
Demo: https://youtu.be/YStDS2-Ln1s
Point-NeRF: Point-based Neural Radiance Fields
Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
Paper: https://arxiv.org/abs/2201.08845
Code: https://github.com/Xharlie/point-nerf
NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images
Paper: https://arxiv.org/abs/2111.13679
Homepage: https://bmild.github.io/rawnerf/
Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc
Urban Radiance Fields
Homepage: https://urban-radiance-fields.github.io/
Paper: https://arxiv.org/abs/2111.14643
Demo: https://youtu.be/qGlq5DZT6uc
Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation
Paper: https://arxiv.org/abs/2202.13162
Code: https://github.com/HexagonPrime/Pix2NeRF
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
Homepage: https://grail.cs.washington.edu/projects/humannerf/
Paper: https://arxiv.org/abs/2201.04127
Demo: https://youtu.be/GM-RoZEymmw
Visual Transformer
Backbone
MPViT : Multi-Path Vision Transformer for Dense Prediction
Paper: https://arxiv.org/abs/2112.11010
Code: https://github.com/youngwanLEE/MPViT
应用(Application)
Language-based Video Editing via Multi-Modal Multi-Level Transformer
Paper: https://arxiv.org/abs/2104.01122
Code: None
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
Paper: https://arxiv.org/abs/2203.00859
Code: None
Embracing Single Stride 3D Object Detector with Sparse Transformer
Paper: https://arxiv.org/abs/2112.06375
Code: https://github.com/TuSimple/SST
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2203.02891
Code: https://github.com/xulianuwa/MCTformer
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Paper: https://arxiv.org/abs/2112.05132
Code: https://github.com/Anirudh257/strm
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Paper: https://arxiv.org/abs/2111.07910
Code: https://github.com/caiyuanhao1998/MST
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Homepage: https://point-bert.ivg-research.xyz/
Paper: https://arxiv.org/abs/2111.14819
Code: https://github.com/lulutang0608/Point-BERT
GroupViT: Semantic Segmentation Emerges from Text Supervision
Homepage: https://jerryxu.net/GroupViT/
Paper: https://arxiv.org/abs/2202.11094
Demo: https://youtu.be/DtJsWIUTW-Y
Restormer: Efficient Transformer for High-Resolution Image Restoration
Paper: https://arxiv.org/abs/2111.09881
Code: https://github.com/swz30/Restormer
Splicing ViT Features for Semantic Appearance Transfer
Homepage: https://splice-vit.github.io/
Paper: https://arxiv.org/abs/2201.00424
Code: https://github.com/omerbt/Splice
Self-supervised Video Transformer
Homepage: https://kahnchana.github.io/svt/
Paper: https://arxiv.org/abs/2112.01514
Code: https://github.com/kahnchana/svt
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
Paper: https://arxiv.org/abs/2203.02664
Code: https://github.com/rulixiang/afa
Accelerating DETR Convergence via Semantic-Aligned Matching
Paper: https://arxiv.org/abs/2203.06883
Code: https://github.com/ZhangGongjie/SAM-DETR
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Paper: https://arxiv.org/abs/2203.01305
Code: https://github.com/FengLi-ust/DN-DETR
中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w
Style Transformer for Image Inversion and Editing
Paper: https://arxiv.org/abs/2203.07932
Code: https://github.com/sapphire497/style-transformer
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Paper: https://arxiv.org/abs/2203.10981
Code: https://github.com/kuanchihhuang/MonoDTR
Mask Transfiner for High-Quality Instance Segmentation
Paper: https://arxiv.org/abs/2111.13673
Code: https://github.com/SysCV/transfiner
视觉和语言(Vision-Language)
Conditional Prompt Learning for Vision-Language Models
Paper: https://arxiv.org/abs/2203.05557
Code: https://github.com/KaiyangZhou/CoOp
自监督学习(Self-supervised Learning)
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
Paper: https://arxiv.org/abs/2203.06965
Code: None
Crafting Better Contrastive Views for Siamese Representation Learning
Paper: https://arxiv.org/abs/2202.03278
Code: https://github.com/xyupeng/ContrastiveCrop
中文解读:https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A
HCSC: Hierarchical Contrastive Selective Coding
Homepage: https://github.com/gyfastas/HCSC
Paper: https://arxiv.org/abs/2202.00455
中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ
数据增强(Data Augmentation)
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
Paper: https://arxiv.org/abs/2202.12513
Code: https://github.com/DensoITLab/TeachAugment
AlignMix: Improving representation by interpolating aligned features
Paper: https://arxiv.org/abs/2103.15375
Code: None
目标检测(Object Detection)
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Paper: https://arxiv.org/abs/2203.01305
Code: https://github.com/FengLi-ust/DN-DETR
中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w
Accelerating DETR Convergence via Semantic-Aligned Matching
Paper: https://arxiv.org/abs/2203.06883
Code: https://github.com/ZhangGongjie/SAM-DETR
Localization Distillation for Dense Object Detection
Paper: https://arxiv.org/abs/2102.12252
Code: https://github.com/HikariTJU/LD
Code2: https://github.com/HikariTJU/LD
中文解读:https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg
Focal and Global Knowledge Distillation for Detectors
Paper: https://arxiv.org/abs/2111.11837
Code: https://github.com/yzd-v/FGD
中文解读:https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ
A Dual Weighting Label Assignment Scheme for Object Detection
Paper: https://arxiv.org/abs/2203.09730
Code: https://github.com/strongwolf/DW
目标跟踪(Visual Tracking)
Correlation-Aware Deep Tracking
Paper: https://arxiv.org/abs/2203.01666
Code: None
TCTrack: Temporal Contexts for Aerial Tracking
Paper: https://arxiv.org/abs/2203.01885
Code: https://github.com/vision4robotics/TCTrack
语义分割(Semantic Segmentation)
弱监督语义分割
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2203.00962
Code: https://github.com/zhaozhengChen/ReCAM
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2203.02891
Code: https://github.com/xulianuwa/MCTformer
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
Paper: https://arxiv.org/abs/2203.02664
Code: https://github.com/rulixiang/afa
半监督语义分割
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2106.05095
Code: https://github.com/LiheYoung/ST-PlusPlus
中文解读:https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
Homepage: https://haochen-wang409.github.io/U2PL/
Paper: https://arxiv.org/abs/2203.03884
Code: https://github.com/Haochen-Wang409/U2PL
中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ
无监督语义分割
GroupViT: Semantic Segmentation Emerges from Text Supervision
Homepage: https://jerryxu.net/GroupViT/
Paper: https://arxiv.org/abs/2202.11094
Demo: https://youtu.be/DtJsWIUTW-Y
实例分割(Instance Segmentation)
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
Paper: https://arxiv.org/abs/2203.04074
Code: https://github.com/zhang-tao-whu/e2ec
Mask Transfiner for High-Quality Instance Segmentation
Paper: https://arxiv.org/abs/2111.13673
Code: https://github.com/SysCV/transfiner
自监督实例分割
FreeSOLO: Learning to Segment Objects without Annotations
Paper: https://arxiv.org/abs/2202.12181
Code: None
视频实例分割
Efficient Video Instance Segmentation via Tracklet Query and Proposal
Homepage: https://jialianwu.com/projects/EfficientVIS.html
Paper: https://arxiv.org/abs/2203.01853
Demo: https://youtu.be/sSPMzgtMKCE
小样本分割(Few-Shot Segmentation)
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation
Paper: https://arxiv.org/abs/2203.07615
Code: https://github.com/chunbolang/BAM
视频理解(Video Understanding)
Self-supervised Video Transformer
Homepage: https://kahnchana.github.io/svt/
Paper: https://arxiv.org/abs/2112.01514
Code: https://github.com/kahnchana/svt
行为识别(Action Recognition)
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Paper: https://arxiv.org/abs/2112.05132
Code: https://github.com/Anirudh257/strm
动作检测(Action Detection)
End-to-End Semi-Supervised Learning for Video Action Detection
Paper: https://arxiv.org/abs/2203.04251
Code: None
图像编辑(Image Editing)
Style Transformer for Image Inversion and Editing
Paper: https://arxiv.org/abs/2203.07932
Code: https://github.com/sapphire497/style-transformer
Blended Diffusion for Text-driven Editing of Natural Images
Paper: https://arxiv.org/abs/2111.14818
Code: https://github.com/omriav/blended-diffusion
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Homepage: https://semanticstylegan.github.io/
Paper: https://arxiv.org/abs/2112.02236
Demo: https://semanticstylegan.github.io/videos/demo.mp4
Low-level Vision
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
Paper: https://arxiv.org/abs/2111.15362
Code: None
Restormer: Efficient Transformer for High-Resolution Image Restoration
Paper: https://arxiv.org/abs/2111.09881
Code: https://github.com/swz30/Restormer
超分辨率(Super-Resolution)
图像超分辨率(Image Super-Resolution)
Learning the Degradation Distribution for Blind Image Super-Resolution
Paper: https://arxiv.org/abs/2203.04962
Code: https://github.com/greatlog/UnpairedSR
视频超分辨率(Video Super-Resolution)
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
Paper: https://arxiv.org/abs/2104.13371
Code: https://github.com/open-mmlab/mmediting
Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus
中文解读:https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g
3D点云(3D Point Cloud)
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Homepage: https://point-bert.ivg-research.xyz/
Paper: https://arxiv.org/abs/2111.14819
Code: https://github.com/lulutang0608/Point-BERT
A Unified Query-based Paradigm for Point Cloud Understanding
Paper: https://arxiv.org/abs/2203.01252
Code: None
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
Paper: https://arxiv.org/abs/2203.00680
Code: https://github.com/MohamedAfham/CrossPoint
PointCLIP: Point Cloud Understanding by CLIP
Paper: https://arxiv.org/abs/2112.02413
Code: https://github.com/ZrrSkywalker/PointCLIP
3D目标检测(3D Object Detection)
Embracing Single Stride 3D Object Detector with Sparse Transformer
Paper: https://arxiv.org/abs/2112.06375
Code: https://github.com/TuSimple/SST
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
Paper: https://arxiv.org/abs/2011.12001
Code: https://github.com/qq456cvb/CanonicalVoting
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Paper: https://arxiv.org/abs/2203.10981
Code: https://github.com/kuanchihhuang/MonoDTR
3D语义分割(3D Semantic Segmentation)
Scribble-Supervised LiDAR Semantic Segmentation
Paper: https://arxiv.org/abs/2203.08537
Dataset: https://github.com/ouenal/scribblekitti
3D目标跟踪(3D Object Tracking)
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
Paper: https://arxiv.org/abs/2203.01730
Code: https://github.com/Ghostish/Open3DSOT
3D人体姿态估计(3D Human Pose Estimation)
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
Paper: https://arxiv.org/abs/2111.12707
Code: https://github.com/Vegetebird/MHFormer
中文解读: https://zhuanlan.zhihu.com/p/439459426
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
Paper: https://arxiv.org/abs/2203.00859
Code: None
3D语义场景补全(3D Semantic Scene Completion)
MonoScene: Monocular 3D Semantic Scene Completion
Paper: https://arxiv.org/abs/2112.00726
Code: https://github.com/cv-rits/MonoScene
3D重建(3D Reconstruction)
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
Homepage: https://banmo-www.github.io/
Paper: https://arxiv.org/abs/2112.12761
Code: https://github.com/facebookresearch/banmo
中文解读:https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew
伪装物体检测(Camouflaged Object Detection)
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection
Paper: https://arxiv.org/abs/2203.02688
Code: https://github.com/lartpang/ZoomNet
深度估计(Depth Estimation)
单目深度估计
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
Paper: https://arxiv.org/abs/2203.01502
Code: None
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
Paper: https://arxiv.org/abs/2203.00838
Code: None
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
Paper: https://arxiv.org/abs/2112.02306
Code: None
立体匹配(Stereo Matching)
ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching
Paper: https://arxiv.org/abs/2203.02146
Code: https://github.com/gangweiX/ACVNet
车道线检测(Lane Detection)
Rethinking Efficient Lane Detection via Curve Modeling
Paper: https://arxiv.org/abs/2203.02431
Code: https://github.com/voldemortX/pytorch-auto-drive
Demo:https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4
图像修复(Image Inpainting)
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
Paper: https://arxiv.org/abs/2203.00867
Code: https://github.com/DQiaole/ZITS_inpainting
人群计数(Crowd Counting)
Leveraging Self-Supervision for Cross-Domain Crowd Counting
Paper: https://arxiv.org/abs/2103.16291
Code: None
医学图像(Medical Image)
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
Paper: https://arxiv.org/abs/2203.02533
Code: None
场景图生成(Scene Graph Generation)
SGTR: End-to-end Scene Graph Generation with Transformer
Paper: https://arxiv.org/abs/2112.12970
Code: None
风格迁移(Style Transfer)
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
Homepage: https://lukashoel.github.io/stylemesh/
Paper: https://arxiv.org/abs/2112.01530
Code: https://github.com/lukasHoel/stylemesh
Demo:https://www.youtube.com/watch?v=ZqgiTLcNcks
弱监督物体检测(Weakly Supervised Object Localization)
Weakly Supervised Object Localization as Domain Adaption
Paper: https://arxiv.org/abs/2203.01714
Code: https://github.com/zh460045050/DA-WSOL_CVPR2022
高光谱图像重建(Hyperspectral Image Reconstruction)
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Paper: https://arxiv.org/abs/2111.07910
Code: https://github.com/caiyuanhao1998/MST
水印(Watermarking)
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings
Paper: https://arxiv.org/abs/2104.13450
Code: None
数据集(Datasets)
It's About Time: Analog Clock Reading in the Wild
Homepage: https://charigyang.github.io/abouttime/
Paper: https://arxiv.org/abs/2111.09162
Code: https://github.com/charigyang/itsabouttime
Demo: https://youtu.be/cbiMACA6dRc
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
Paper: https://arxiv.org/abs/2112.02306
Code: None
Kubric: A scalable dataset generator
Paper: https://arxiv.org/abs/2203.03570
Code: https://github.com/google-research/kubric
Scribble-Supervised LiDAR Semantic Segmentation
Paper: https://arxiv.org/abs/2203.08537
Dataset: https://github.com/ouenal/scribblekitti
新任务(New Task)
Language-based Video Editing via Multi-Modal Multi-Level Transformer
Paper: https://arxiv.org/abs/2104.01122
Code: None
It's About Time: Analog Clock Reading in the Wild
Homepage: https://charigyang.github.io/abouttime/
Paper: https://arxiv.org/abs/2111.09162
Code: https://github.com/charigyang/itsabouttime
Demo: https://youtu.be/cbiMACA6dRc
Splicing ViT Features for Semantic Appearance Transfer
Homepage: https://splice-vit.github.io/
Paper: https://arxiv.org/abs/2201.00424
Code: https://github.com/omerbt/Splice
其他(Others)
Kubric: A scalable dataset generator
Paper: https://arxiv.org/abs/2203.03570
Code: https://github.com/google-research/kubric
机器学习算法AI大数据技术
搜索公众号添加: datanlp
长按图片,识别二维码
不断更新资源
深度学习、机器学习、数据分析、python
搜索公众号添加: datayx