一、安装准备
1添加阿里云的安装源
curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo
2安装基础环境
yum -y install apr autoconf automake bash bash-completion bind-utils bzip2 bzip2-devel chrony cmake coreutils curl curl-devel dbus dbus-libs dhcp-common dos2unix e2fsprogs e2fsprogs-devel file file-libs freetype freetype-devel gcc gcc-c++ gdb glib2 glib2-devel glibc glibc-devel gmp gmp-devel gnupg iotop kernel kernel-devel kernel-doc kernel-firmware kernel-headers krb5-devel libaio-devel libcurl libcurl-devel libevent libevent-devel libffi-devel libidn libidn-devel libjpeg libjpeg-devel libmcrypt libmcrypt-devel libpng libpng-devel libxml2 libxml2-devel libxslt libxslt-devel libzip libzip-devel lrzsz lsof make microcode_ctl mysql mysql-devel ncurses ncurses-devel net-snmp net-snmp-libs net-snmp-utils net-tools nfs-utils nss nss-sysinit nss-tools openldap-clients openldap-devel openssh openssh-clients openssh-server openssl openssl-devel patch policycoreutils polkit procps readline-devel rpm rpm-build rpm-libs rsync sos sshpass strace sysstat tar tmux tree unzip uuid uuid-devel vim wget yum-utils zip zlib* jq
3时间同步
systemctl start chronyd && systemctl enable chronyd
4重启
reboot
5整体升级
yum update -y
6再次重启
reboot
二、安装GPU显卡驱动
1禁用系统默认安装的 nouveau 驱动
# 修改配置
echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf
# 备份原来的镜像文件
cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
# 重建新镜像文件
sudo dracut --force
# 重启
reboot
# 查看nouveau是否启动,如果结果为空即为禁用成功
lsmod | grep nouveau
2安装DKMS模块
DKMS全称是DynamicKernel ModuleSupport,它可以帮我们维护内核外的驱动程序,在内核版本变动之后可以自动重新生成新的模块。
yum -y install dkms
3拷贝驱动安装包
如果没有提前下载,官网下载即可驱动官网下载地址
cp NVIDIA-Linux-x86_64-418.226.00.run /data/
4安装
sudo sh NVIDIA-Linux-x86_64-418.226.00.run -no-x-check -no-nouveau-check -no-opengl-files
# -no-x-check #安装驱动时关闭X服务
# -no-nouveau-check #安装驱动时禁用nouveau
# -no-opengl-files #只安装驱动文件,不安装OpenGL文件
5按照安装提示进行安装,一路点yes、ok
6验证安装结果
nvidia-smi
7显示如下代表安装成功
Wed Jul 7 11:11:33 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.129 Driver Version: 410.129 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:41:00.0 Off | 0 |
| N/A 94C P0 36W / 70W | 0MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
8显卡验证
lspci | grep -i nvidia
41:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
8.1可能报错指令不存在,安装如下指令
yum install -y pciutils
二、下载升级 gcc 源码并编译安装:
1安装
cd /data/
wget https://mirrors.tuna.tsinghua.edu.cn/gnu/gcc/gcc-8.5.0/gcc-8.5.0.tar.gz
tar -xvf gcc-8.5.0.tar.gz
cd gcc-8.5.0
./contrib/download_prerequisites
mkdir build
cd build
../configure --enable-checking=release --enable-languages=c,c++ --disable-multilib
make -j 16
make install
2建立软连接
cp /usr/local/lib64/libstdc++.so.6.0.25 /lib64
cd /lib64
rm -rf libstdc++.so.6
ln -s libstdc++.so.6.0.25 libstdc++.so.6
3查看
gcc -v
4显示如下代表安装成功
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/8.5.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-checking=release --enable-languages=c,c++ --disable-multilib
Thread model: posix
gcc version 8.5.0 (GCC)
三、英伟达cuda安装
1禁用Nouveau
没有输出就是已经禁用了Nouveau
[root@localhost opt]# lsmod | grep nouveau
2设置开机启动级别
systemctl set-default multi-user.target
3下载cuda安装包
也可以离线下载,cuda官网下载地址
wget https://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
4安装
sudo sh cuda_10.1.243_418.87.00_linux.run
5会出现安装界面,输入accept,第二个界面, 直接选择install
6添加CUDA进入环境变量
6.0 打开配置文件
vim /etc/profile
6.1在开头添加以下四行
输入 i按键,然后粘贴以下四行,输入esc按键,输入:wq保存退出
PATH=$PATH:/usr/local/cuda-10.1/bin/
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.1/lib64/
export PATH
export LD_LIBRARY_PATH
6.2生效文件
source /etc/profile
7验证安装
输出相应的版本
nvcc -V
四、英伟达cudnn安装
1cudnn下载
下载相关版本的CUDNN(需要先注册账号才能下载):注意:要选择CUDA相对应版本的。
下载地址
上传并解压
cd /data/
tar xzvf cudnn-10.1-linux-x64-v7.6.5.32.tgz
cp cuda/include/cudnn.h /usr/local/cuda/include
cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
五、安装基本docker
1卸载旧版本
官方安装参考
sudo yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine
2下载安装包
sudo yum install -y yum-utils
sudo yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
3配置
停用 disable
sudo yum-config-manager --enable docker-ce-nightly
sudo yum-config-manager --enable docker-ce-test
4安装最新版 Docker Engine
sudo yum install docker-ce docker-ce-cli containerd.io
5启动docker
sudo systemctl start docker
6验证docker是否安装成功
提示以下内容代表安装成功
sudo docker run hello-world
六、安装Nvidia-docker
官方安装参考
因为原本的docker不支持GPU加速,所以NVIDIA单独做了一个docker来加速gpu
1安装依赖
sudo dnf install -y tar bzip2 make automake gcc gcc-c++ vim pciutils elfutils-libelf-devel libglvnd-devel iptables
1.1可能报错
sudo: dnf: command not found
执行以下指令,然后重复上面安装
yum install dnf
2安装docker CE
sudo yum-config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
sudo yum repolist -v
sudo yum install -y https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.4.3-3.1.el7.x86_64.rpm
sudo yum install docker-ce -y
sudo systemctl --now enable docker
sudo docker run --rm hello-world
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo yum clean expire-cache
sudo yum install -y nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:10.1-base nvidia-smi