Cuda __CUDA_ARCH__宏-CFANZ编程社区

Cuda __CUDA_ARCH__宏

__CUDA_ARCH__属于NVCC的宏

5.7.4. Virtual Architecture Macros 给出说明

The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy.

This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

The architecture list macro __CUDA_ARCH_LIST__ is a list of comma-separated __CUDA_ARCH__ values for each of the virtual architectures specified in the compiler invocation. The list is sorted in numerically ascending order.

The macro __CUDA_ARCH_LIST__ is defined when compiling C, C++ and CUDA source files.
在编译时才定义，因此在代码编辑器中是看不到它的值的，也不要尝试自己写这个宏

For example, the following nvcc compilation command line will define __CUDA_ARCH_LIST__ as 500,530,800 :

nvcc x.cu \
--generate-code arch=compute_80,code=sm_80 \
--generate-code arch=compute_50,code=sm_52 \
--generate-code arch=compute_50,code=sm_50 \
--generate-code arch=compute_53,code=sm_53

通过nvcc编译命令 -arch设置架构
在vs中，如果设置了多个Code Generation（即命令-gencode=arch=compute_xx,code=sm_xx），会以最高的架构版本为准
在这里插入图片描述

更多技术细节见NVIDIA CUDA Compiler Driver NVCC

宏的用法见Cuda 12.0文档 14.5.2.1. _CUDA_ARCH _，老版本的章节不同：Cuda 8.0文档 E.3.2.1. _ CUDA_ARCH _

如果想打印看看__CUDA_ARCH__宏，可以这样做

#include <stdio.h>

__global__ void Mykernel()
{
    printf("%d\n", __CUDA_ARCH__);
}

int main()
{
    Mykernel<<<1, 5>>>();
    cudaDeviceSynchronize();
    return 0;
}

0 条评论