Demo代码:
from mpi4py import MPI
import cupy as cp
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
sendbuf = cp.arange(10, dtype='i')
recvbuf = cp.empty_like(sendbuf)
assert hasattr(sendbuf, '__cuda_array_interface__')
assert hasattr(recvbuf, '__cuda_array_interface__')
cp.cuda.get_current_stream().synchronize()
comm.Allreduce(sendbuf, recvbuf)
assert cp.allclose(recvbuf, sendbuf*size)
使用anaconda环境:
安装mpi4py:
conda install -c conda-forge mpi4py openmpi
安装后的提示:
For Linux 64, Open MPI is built with CUDA awareness but this support is disabled by default.
To enable it, please set the environmental variable OMPI_MCA_opal_cuda_support=true before
launching your MPI processes. Equivalently, you can set the MCA parameter in the command line:
mpiexec --mca opal_cuda_support 1 ...
这个提示是说虽然anaconda安装的mpi4py是支持cuda-aware的,但是默认并没有开启,所以在运行时要设置环境变量 OMPI_MCA_opal_cuda_support=true ,并且运行时使用参数 mpiexec --mca opal_cuda_support 1
安装cupy:
conda install -c conda-forge cupy cudnn cutensor nccl
运行demo代码:
mpiexec --mca opal_cuda_support 1 -N 4 python x.py
==================================================