> 文档中心 > 安装CUDA时报错packages have unmet dependencies的一个可能原因

安装CUDA时报错packages have unmet dependencies的一个可能原因

    先是想着尝鲜,安装了最新的CUDA11.4:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pinsudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600wget https://developer.download.nvidia.com/compute/cuda/11.4.1/local_installers/cuda-repo-ubuntu1804-11-4-local_11.4.1-470.57.02-1_amd64.debsudo dpkg -i cuda-repo-ubuntu1804-11-4-local_11.4.1-470.57.02-1_amd64.debsudo apt-key add /var/cuda-repo-ubuntu1804-11-4-local/7fa2af80.pubsudo apt-get updatesudo apt-get -y install cuda

结果安装pytorch时发现最新的1.9也只支持到了CUDA11.1,虽然不影响pytorch的安装成功,但是跑代码时用到cuda时就报下面的错误

GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
>>> print(a)
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/local/python3/lib/python3.9/site-packages/torch/tensor.py", line 193, in __repr__
    return torch._tensor_str._str(self)
  File "/usr/local/python3/lib/python3.9/site-packages/torch/_tensor_str.py", line 383, in _str
    return _str_intern(self)
  File "/usr/local/python3/lib/python3.9/site-packages/torch/_tensor_str.py", line 358, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/usr/local/python3/lib/python3.9/site-packages/torch/_tensor_str.py", line 242, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/usr/local/python3/lib/python3.9/site-packages/torch/_tensor_str.py", line 90, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: no kernel image is available for execution on the device

于是使用下面的命令删掉CUDA11.4:

sudo apt-get --purge remove "cuda*"sudo apt-get --purge remove "*nvidia*470"

然后再使用下面的命令安装CUDA11.1.1

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pinsudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600wget https://developer.download.nvidia.com/compute/cuda/11.1.1/local_installers/cuda-repo-ubuntu1804-11-1-local_11.1.1-455.32.00-1_amd64.debsudo dpkg -i cuda-repo-ubuntu1804-11-1-local_11.1.1-455.32.00-1_amd64.debsudo apt-key add /var/cuda-repo-ubuntu1804-11-1-local/7fa2af80.pubsudo apt-get updatesudo apt-get -y install cuda

结果报下面的错误:

You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 cuda : Depends: cuda-11-1 (>= 11.1.1) but it is not going to be installed
 nvidia-dkms-470 : Depends: nvidia-kernel-source-470 but it is not going to be installed
                   Depends: nvidia-kernel-common-470 (= 470.57.02-0ubuntu1) but it is not going to be installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

反复删除安装了几遍还是这样,想想怎么还提示

cuda : Depends: cuda-11-1 (>= 11.1.1) but it is not going to be installed

是不是CUDA11.4的包没删除干净,于是执行:

sudo apt-get --purge remove "cuda*"sudo apt-get --purge remove "*nvidia*"

把cuda和nvidia驱动所有相关的东西都删了(nvidia docker2也被误删了),再安装CUDA11.1.1,就顺利成功了。

不过由于把nvidia docker2也误删了,所以还得再安装一下:

sudo apt-get updatesudo apt-get install -y nvidia-docker2sudo systemctl restart docker

如果安装CUDA后没有重启让driver生效,那么在通过docker run --gpus all ...或者nvidia-docker run ...启动nvidia docker时会报错:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver/library version mismatch\\\\n\\\"\"": unknown.

这种错误一般是因为GPU驱动没有安装或者安装后没有生效。

红酒品牌排行榜