环境
- 主机:VMware ESXi, 6.7.0, 13006603
- 设备型号:SYS-4028GR-TR
- 虚拟机:CentOS 7 (64-bit) 8核16G,硬盘40G,1张 NVIDIA TITAN X
准备虚拟机
虚拟机添加显卡
- 登录 vcenter 可视化界面 > 进入虚拟机界面
- 虚拟机关机状态下:操作 > 编辑设置
- 虚拟机硬件tag:
- 添加设备 > PCI 设备 > 选择需要的显卡
- 显卡选择自动检测设置
- CPU 去掉硬件虚拟化的勾选
- 启动虚拟机,启动 web 控制台,cooker登录后,查看已添加的 PCI 设备,是否含有显卡,有则添加成功:lspci | grep NVI
配置IP 和 hostname
- 通过web控制台进入虚拟机
- 进入设置页面改IP、改hostname:nmtui
- root 权限重启网络:service network restart
- ifconfig 查看IP是否修改成功,或者通过 vcenter 查看IP是否发生变化
- 重启,查看hostname:reboot
虚拟机安装显卡驱动
- 下载驱动:wget https://cn.download.nvidia.cn/XFree86/Linux-x86_64/525.60.11/NVIDIA-Linux-x86_64-525.60.11.run
- 设置为可执行:chmod +x NVIDIA-Linux-x86_64-525.60.11.run
- 执行前准备:
- 禁用默认的 GPU 驱动 nouveau (一个开源的驱动,性能比较差,需要安装 NVIDIA 官方的驱动):
vi /usr/lib/modprobe.d/blacklist-nouveau.conf
# 1. 追加以下命令
blacklist nouveau # 完全阻止内核加载 nouveau 模块
options nouveau modeset=0 # 如果nouveau模块已经被加载进内核,那么就禁用它
# 2. 重新生成新的 初始 ram 文件系统
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)
reboot
# 3. 检查禁用 nouveau 是否生效
ls mod | grep novueau # 若没有,则表示禁用生效
- ./NVIDIA-Linux-x86_64-525.60.11.run --kernel-source-path=内核代码所在的目录(/usr/src/kernels/$(uname -sr))
- 错误1: Unable to find the kernel source tree for the currently running kernel. Please make sure you have
installed the kernel source files for your kernel and that they are properly configured; on Red Hat
Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed.
If you know the correct kernel source files are installed, you may specify the kernel source path
with the '--kernel-source-path' command line option.
- 检查是否有安装 kernel-devel,如果有,检查是否与当前的系统内核版本不一致(安装系统版本对应的 kernel-devel)
- 升级内核教程:https://blog.csdn.net/evane1890/article/details/108777303
yum list kernel -q
yum update -y kernel
reboot
rpm -q kernel
yum -y remove kernel-3.10.0-1062.el7.x86_64 kernel-3.10.0-1062.4.1.el7.x86_64 kernel-3.10.0-1127.el7.x86_64 kernel-3.10.0-1127.18.2.el7.x86_64
yum clean all
yum update
reboot
yum install kernel-devel kernel-headers -y
- 验证是否安装成功: nvidia-smi 不报错,且看到显卡,表示正常
nvidia-smi -pm 1 #常驻内存
vi /etc/rc.d/rc.local 追加 nvidia-smi -pm 1 #开机启动
chmod +x /etc/rc.d/rc.local
reboot
跑起 stable-diffusion-bentoml
安装 bentoml要求 Python3.7 或以上
- 编译 Python3.7 以上
- 编译PYthon https://cloud.tencent.com/developer/article/1822337
- 编译openssl https://blog.csdn.net/chendongpu/article/details/109766210
- 编译 openssl 时出现的问题 https://blog.csdn.net/sd4493091/article/details/122220902
sudo su
yum install -y zlib zlib-dev openssl-devel sqlite-devel bzip2-devel libffi libffi-devel gcc gcc-c++
# 1. 编译 openssl,因为在编译Python时,需要依赖更高的 openssl
cd /home/cooker
wget https://www.openssl.org/source/openssl-3.0.7.tar.gz
tar -zxvf openssl-3.0.7.tar.gz
yum install -y perl-CPAN
perl -MCPAN -e shell
cpan[1]> install IPC/Cmd.pm
cd openssl-3.0.7/
./config --prefix=/usr/local/openssl shared zlib
make && make install
# 这里需要注意 /usr/local/openssl中 lib是否存在,若不存在看看是否存在 lib64,如果是就改成 lib64
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/openssl/lib64" >> $HOME/.bash_profile
source .bash_profile
/usr/local/openssl/bin/openssl version # 验证是否安装完成,且生效
# 2. Python
cd /home/cooker
wget https://www.python.org/ftp/python/3.9.16/Python-3.9.16.tgz
tar -zxvf Python-3.9.16.tgz
cd Python-3.9.16/
/configure --prefix=/usr/local/python --with-openssl=/usr/local/openssl # 指定 openssl 的目录
make
# 若 make 未出现 _ssl 错误,则执行 make install,否则检查 Makefile 中,openssl 对应参数是否正确,不正确就改一下
# OPENSSL_INCLUDES=-I/usr/local/openssl/include
# OPENSSL_LIBS=-lssl -lcrypto
# OPENSSL_LDFLAGS=-L/usr/local/openssl/lib64
make install
/usr/local/python/bin/python3.9
>>> import _ssl # 若不报错,表示编译安装 OK 了
# 将 /usr/local/python/bin 添加到 /root/.bashrc 中
- /usr/local/python/bin/pip3.9 install "bentoml>=1.0.5"
跑起 stable-diffusion-bentoml
yum install -y git
git clone https://github.com/bentoml/stable-diffusion-bentoml.git
pip3.9 install torch transformers diffusers ftfy pydantic # 安装 Python 包依赖
cd stable-diffusion-bentoml/fp32
curl https://s3.us-west-2.amazonaws.com/bentoml.com/stable_diffusion_bentoml/sd_model_v1_4.tgz | tar zxf - -C models/
BENTOML_CONFIG=configuration.yaml bentoml serve service:svc --production