目录
学习《深度学习推荐系统》这本书时,为加深理解,尝试以DIEN论文提及的测试程序为抓手,在MacOS工作笔记本上构建了完整的模型训练和测试环境。
构建环境
当前,手头的macOS开发环境如下:
- macOS Monterey 12.1
- Xcode Command Line Tools(xcode-select version 2392)
- iTerm2
Step 1:搭建初始环境
安装Homebrew
首选从Homebrew官网安装。如果安装过程太耗时或下载失败,可以尝试以下方案:
# 1.下载安装脚本
wget https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh
# 2.将脚本中的下述两个变量替换为国内镜像
HOMEBREW_BREW_DEFAULT_GIT_REMOTE="https://mirrors.aliyun.com/homebrew/brew.git"
HOMEBREW_CORE_DEFAULT_GIT_REMOTE="https://mirrors.aliyun.com/homebrew/homebrew-core.git"
# 3.安装Homebrew
bash install.sh
若已安装过Homebrew,但运行 "brew update" 失败,可替换为阿里云Homebrew镜像。步骤详见 homebrew镜像-homebrew下载地址-homebrew安装教程-阿里巴巴开源镜像站 。
安装Pyenv
相关背景知识,可参考官网。
# 1. 安装pyenv
brew install pyenv
brew install pyenv-virtualenv
# 2. 在 "~/.bash_profile"设置变量
echo 'PATH=$(pyenv root)/shims:$PATH' >> ~/.bash_profile
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
Step 2:构建开发环境
TF1和TF2依赖的Python版本差异较大。为了便于尝试多个Python版本(例如:Python 2.7.18 vs. 3.9.10),可以在Pyenv中构建Python环境,甚至通过virtualenv,分目录设置不同的Python版本。
安装多版本Python
# 如果从Python官网下载安装包太慢,可以尝试本地缓存方案。
cd ~/.pyenv/
mkdir cache
cd cache
wget https://www.python.org/ftp/python/2.7.18/Python-2.7.18.tar.xz
wget https://www.python.org/ftp/python/3.9.10/Python-3.9.10.tar.xz
# 安装Python
pyenv install 2.7.18
pyenv install 3.9.10
在安装过程中,需要源码编译Python,可能出现如下错误:
python-build: use zlib from xcode sdk
BUILD FAILED (OS X 12.1 using python-build 20180424)
clang: error: unsupported option '-V -Wno-objc-signed-char-bool-implicit-int-conversion'
clang: error: unknown argument '-qversion'; did you mean '--version'?
clang: error: invalid version number in 'MACOSX_DEPLOYMENT_TARGET=12.1'
make: *** No targets specified and no makefile found. Stop.
这与Xcode command-line tools版本有关,需要重新安装,执行如下步骤:
sudo rm -rf /Library/Developer/CommandLineTools
xcode-select --install
参考资料:
- BUILD FAILED (OS X 11.0.1 using python-build 20180424) · Issue #1738 · pyenv/pyenv · GitHub
-
Technical Note TN2339: Building from the Command Line with Xcode FAQ
-
Home · pyenv/pyenv Wiki · GitHub
-
Common build problems · pyenv/pyenv Wiki · GitHub
设置虚拟环境
以Python 2.7.18为例,my-py2-workshop目录是所需的工作目录。
# 1.创建虚拟环境
pyenv virtualenv 2.7.18 my-py2
# 2.创建工作目录
mkdir ~/my-py2-workshop
# 3.在工作目录设置application-specific虚拟环境
cd ~/my-py2-workshop
pyenv local my-py2
参考资料:
-
Creating virtual environments with Pyenv – Rob Allen's DevNotes
-
How to use pyenv to run multiple versions of Python on a Mac | Opensource.com
Step 3:完善Python开发环境
经过上述几步,已在my-py2-workshop目录构建了Python 2.7.18环境。为了减少后续脚本适配改造,需要确保PIP升级到20.3.4版本。若为更低版本,可执行如下操作:
pip install --upgrade pip
如果PIP安装速度过慢,可以尝试替换为国内镜像。例如:Simple Index。
至此,完成所有开发环境准备工作。Python 3.9.10的设置方法类似,不再赘述。
总结:在my-py2-workshop目录中,使用Python 2.7.18。其他目录,仍沿用系统默认Python版本。
训练测试
Step 1:下载源代码
# 1.切换到工作目录
cd ~/my-py2-workshop
# 2.直接下载ZIP文件(1f314d1 on 18 Jan 2019,commit 1f314d16aa1700ee02777e6163fb8ca94e3d2810)
wget https://github.com/mouna99/dien/archive/refs/heads/master.zip
unzip master.zip
Step 2:准备训练数据
# 1.进入DIEN目录
cd ~/my-py2-workshop/dien-master
# 2.为缩短等待时间,仅取部分数据
tar -jxvf data1.tar.gz
head -n100000 data1/reviews-info > reviews-info
tar -jxvf data2.tar.gz
mv data2/item-info .
说明:
- README.md 中有详细的步骤,采用"method 2"准备数据。
- 事实上,仅依赖reviews-info和item-info两个数据文件,后续步骤将生成其他依赖文件。
- 关于Amazon product data 的内容格式,详见 Amazon review data
参考 prepare_data.sh ,执行以下Python脚本:
# 注释掉process_data.py的#98和#99,不执行process_meta()和process_reviews()两个步骤。
python script/process_data.py
# 依次执行以下步骤
python script/local_aggretor.py
python script/split_by_user.py
python script/generate_voc.py
Step 3:进行模型训练
确认已安装以下模块
pip install numpy==1.16.6
pip install tensorflow==1.15.0
pip install protobuf==3.17.3
pip install keras==2.8.0
在 README.md 中提及的TensorFlow 1.4版本过于陈旧,搭建环境非常困难。因此,使用TF1的最终发布版TensorFlow 1.15.0,也更容易查阅官方文档https://www.tensorflow.org/versions/r1.15/api_docs 。
python script/train.py train DNN
过程中可能出现一些报错,简单处理就可修复。
# 报错一
Traceback (most recent call last):
File "script/train.py", line 4, in <module>
from model import *
File "script/model.py", line 6, in <module>
from rnn import dynamic_rnn
File "script/rnn.py", line 45, in <module>
_like_rnncell = rnn_cell_impl._like_rnncell
AttributeError: 'module' object has no attribute '_like_rnncell'
# 解决:更改script/rnn.py的#45
45 "_like_rnncell = rnn_cell_impl._like_rnncell" --> "_like_rnncell = rnn_cell_impl.assert_like_rnncell"
# 报错二
Traceback (most recent call last):
File "script/train.py", line 4, in <module>
from model import *
File "script/model.py", line 7, in <module>
from utils import *
File "script/utils.py", line 3, in <module>
from tensorflow.python.ops.rnn_cell_impl import _Linear
ImportError: cannot import name _Linear
# 解决:更改script/utils.py的#3
3 "from tensorflow.python.ops.rnn_cell_impl import _Linear" --> "from tensorflow.contrib.rnn.python.ops.core_rnn_cell import _Linear"
# 报错三
Traceback (most recent call last):
File "script/train.py", line 4, in <module>
from model import *
File "script/model.py", line 7, in <module>
from utils import *
File "script/utils.py", line 9, in <module>
from keras import backend as K
File "~/.pyenv/versions/my-py2/lib/python2.7/site-packages/keras/__init__.py", line 22, in <module>
from keras import distribute
File "~/.pyenv/versions/my-py2/lib/python2.7/site-packages/keras/distribute/__init__.py", line 18, in <module>
from keras.distribute import sidecar_evaluator
File "~/.pyenv/versions/my-py2/lib/python2.7/site-packages/keras/distribute/sidecar_evaluator.py", line 180
f'No checkpoints appear to be found after {_CHECKPOINT_TIMEOUT_SEC} '
# 解决:删除script/utils.py的#9
9 "from keras import backend as K"
Step 4:进行模型测试
python script/train.py test DNN
参考资料
虽然 Python 2.7 已经渐出历史舞台,但是它仍是不可或缺的基础环境。
可以参考官方文档,加速学习。
- TensorFlow 1.5 https://www.tensorflow.org/versions/r1.15/api_docs/python/tf
- NumPy Reference NumPy Reference — NumPy v1.16 Manual
- The Python Debugger 26.2. pdb — The Python Debugger — Python 2.7.18 documentation