安装前准备工作:
1.配置cuda10.2
cuda下载:https://developer.nvidia.com/cuda-toolkit-archive
cuDNN下载:https://developer.nvidia.com/rdp/cudnn-archive
安装教程参考这篇文章
2.安装vs2019
环境配置
1.下载
下载网址:https://arrayfire.com/binaries/
2.安装:
安装没有什么值得太多介绍的,点击exe,根据软件界面上的提示,一步一步的安装。安装的时候会出现是否添加用户环境变量,这时我们一定要选择添加环境变量,否则,当我们用vs studio编写程序的时候,可能会出现找不到库的情况。
3.Visual studio环境的配置
我用的是Vs2019。首先打开vs2019,在属性->VC++ 目录->包括目录,填入安装好的库头文件路径,接下来在库目录填入安装好的.lib库文件。如图所示:
包含目录
D:\ArrayFire\v3\include
库目录:
D:\ArrayFire\v3\lib
接下来在Linker->Input->Addtional Depdencies里面填入我们需要使用的库名,如图所示:
af.lib
afcpu.lib
afcuda.lib
afopencl.lib
libiomp5md.lib
mkl_core_dll.lib
mkl_intel_thread_dll.lib
mkl_rt.lib
将lib中的dll拷贝至项目的目录下
测试代码:
#include <arrayfire.h>
#include <math.h>
#include <stdio.h>
#include <cstdlib>
#include <string>
using namespace af;
// create a small wrapper to benchmark
static array A; // populated before each timing
static void fn() {
array B = matmul(A, A); // matrix multiply
}
int main(int argc, char** argv) {
double peak = 0;
try {
int device = argc > 1 ? atoi(argv[1]) : 0;
setDevice(device);
const std::string dtype(argc > 2 ? argv[2] : "f32");
const af_dtype dt = (dtype == "f16" ? f16 : f32);
if (dt == f16)
printf("Device %d isHalfAvailable ? %s\n", device,
isHalfAvailable(device) ? "yes" : "no");
info();
printf("Benchmark N-by-N matrix multiply at %s \n", dtype.c_str());
for (int n = 128; n <= 2048; n += 128) {
printf("%4d x %4d: ", n, n);
A = constant(1, n, n, dt);
double time = timeit(fn); // time in seconds
double gflops = 2.0 * powf(n, 3) / (time * 1e9);
if (gflops > peak) peak = gflops;
printf(" %4.0f Gflops\n", gflops);
fflush(stdout);
}
}
catch (af::exception& e) {
fprintf(stderr, "%s\n", e.what());
throw;
}
printf(" ### peak %g GFLOPS\n", peak);
return 0;
}
结果
ArrayFire v3.8.1 (CUDA, 64-bit Windows, build 823e8e39)
Platform: CUDA Runtime 10.2, Driver: 10020
[0] GeForce GTX 1650, 4096 MB, CUDA Compute 7.5
Benchmark N-by-N matrix multiply at f32
128 x 128: 21 Gflops
256 x 256: 70 Gflops
384 x 384: 171 Gflops
512 x 512: 206 Gflops
640 x 640: 206 Gflops
768 x 768: 257 Gflops
896 x 896: 265 Gflops
1024 x 1024: 283 Gflops
1152 x 1152: 268 Gflops
1280 x 1280: 277 Gflops
1408 x 1408: 266 Gflops
1536 x 1536: 269 Gflops
1664 x 1664: 261 Gflops
1792 x 1792: 259 Gflops
1920 x 1920: 277 Gflops
2048 x 2048: 287 Gflops
### peak 286.816 GFLOPS
参考资料:
https://arrayfire.org/docs/using_on_windows.htm
https://www.cnblogs.com/xuelanga000/p/13286896.html
https://arrayfire.org/docs/benchmarks_2blas_8cpp-example.htm