1.构建Docker镜像时安装各种依赖项和软件包时,遇到问题:
- E: Unable to locate package libcudnn7
- E: Version '2.7.8-1+cuda11.0' for 'libnccl2' was not found
- E: Version '2.7.8-1+cuda11.0' for 'libnccl-dev' was not found
配置代码片段:
- FROM nvidia/cuda:11.0.3-devel-ubuntu18.04
-
- ENV PROJECT=permatrack
- # ENV PYTORCH_VERSION=1.4
- # ENV TORCHVISION_VERSION=0.5.0
- ENV PYTORCH_VERSION=1.7
- ENV TORCHVISION_VERSION=0.8.0
- ENV CUDNN_VERSION=8.0.5.39+cuda11.0
- ENV NCCL_VERSION=2.7.8-1+cuda11.0
- ENV TRT_VERSION=7.2.3
- ENV LC_ALL=C.UTF-8
- ENV LANG=C.UTF-8
-
-
- RUN apt-get update && apt-get install -y --allow-downgrades --allow-change-held-packages --no-install-recommends \
- build-essential \
- cmake \
- g++-4.8 \
- git \
- curl \
- docker.io \
- vim \
- wget \
- ca-certificates \
- libcudnn7=${CUDNN_VERSION} \
- libnccl2=${NCCL_VERSION} \
- libnccl-dev=${NCCL_VERSION} \
- libjpeg-dev \
- libpng-dev \
- python${PYTHON_VERSION} \
- python${PYTHON_VERSION}-dev \
- python3-tk \
- librdmacm1 \
- libibverbs1 \
- libgtk2.0-dev \
- unzip \
- bzip2 \
- htop \
- gnuplot \
- ffmpeg
2.解决办法:不构建cudnn和nccl,即将下面3行注释:
- libcudnn7=${CUDNN_VERSION} \
- libnccl2=${NCCL_VERSION} \
- libnccl-dev=${NCCL_VERSION} \
怀疑:后面nvidia的cuda镜像包含了cudnn和nccl,而在下面的代码中使用docker需要安装nccl和cudnn,应该是镜像比较早的,不过下面的镜像也找不到了,cuda镜像也得改成最新的了GitHub - TRI-ML/permatrack: Implementation for Learning to Track with Object Permanence
看到一个比较好的安装nvidia cuda镜像和使用容器的博客:Ubuntu上从CUDA开始构建深度学习镜像 - 八十八键的宇宙 (yuxinzhao.net)