> 文档中心 > 解决高版本的NVIDIA驱动导致Ubuntu桌面出不来的问题

解决高版本的NVIDIA驱动导致Ubuntu桌面出不来的问题

     一台使用RTX3090 GPU卡的PC在对Ubuntu做apt-get upgrade后重启发现桌面出不来了,为了解决这个问题遇到了多个坑,记下来备忘。

     首先想退回去用旧版的GPU驱动,卸掉已有版本:

     sudo apt-get --purge remove "cuda*"
     sudo apt-get --purge remove "*nvidia*"

然后安装低版本的CUDA10的deb安装包之类,发现即使重启后也不起作用,执行nvidia-smi总是报错:

     Failed to initialize NVML: Driver/library version mismatch

那可能是和当然使用的linux kernel版本不匹配,直接安装deb包是不行的,需要使用源码编译出与当前kernel版本适配的ko,于是改成使用这种使用run文件方式安装:
 

wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.runchmod +x cuda_10.2.89_440.33.01_linux.run./cuda_10.2.89_440.33.01_linux.run

可以安装成功,但是重启系统后桌面还是进不去,切换到文字界面可以看到报错:

改成使用低版本的驱动程序安装则每次安装到最后都报错:

    ERROR: Unable to load the 'nvidia-drm' kernel module

按照网上别人说的一些办法,例如禁用BIOS的secure boot或者升级内核,解决内核和source版本的不一致等等办法通通没用,最后试着安装了一个cuda11.0里包含的driver版本450.80.02对应的run文件 NVIDIA-Linux-x86_64-450.80.02.run来安装却一次性成功了,这说明对于比较新的GPU,需要安装比较新的驱动才行,老版本的驱动安装不了,更不用说跑步起来了。

既然驱动也要是对应于cuda11以上的版本,直接使用cuda11.1.1(RTX30序列GPU好像需要11.1.1或者以上版本才能正常工作)安装更好,但是目前最好不要使用最新的cuda11.3或者cuda11.4,因为像pytorch这样的工具还根本不支持,盲目安装高版本不是啥好事,够用就行。

解决驱动版本的选择问题后,开机启动后还是gdm桌面出不来,看网上有人说gdm3对于最新的NVIDIA的驱动支持不好, 于是安装lightdm 显示管理服务器和Unity桌面:

     sudo apt-get install lightdm unity

安装过程中确认选择lightdm为默认的Display Manager,而不是gdm3(事后需要切换时,可以使用dpkg-reconfigure lightdm) ,然后重启时发现桌面出不来,那个Ubuntu的标记总是在那个动,就是始终桌面出不来:

 

检查状态:

root@ubuntu-rtx3090:~# systemctl status lightdm● lightdm.service - Light Display Manager   Loaded: loaded (/lib/systemd/system/lightdm.service; indirect; vendor preset: enabled)   Active: failed (Result: exit-code) since Wed 2021-08-25 19:15:17 CST; 7min ago     Docs: man:lightdm(1)  Process: 1246 ExecStart=/usr/sbin/lightdm (code=exited, status=1/FAILURE)  Process: 1243 ExecStartPre=/bin/sh -c [ "$(basename $(cat /etc/X11/default-display-manager 2>/dev/null))" = "lightdm" ] (code=exited, status=0/SUCCESS) Main PID: 1246 (code=exited, status=1/FAILURE)8月 25 19:15:17 ubuntu-rtx3090 systemd[1]: lightdm.service: Service hold-off time over, scheduling restart.8月 25 19:15:17 ubuntu-rtx3090 systemd[1]: lightdm.service: Scheduled restart job, restart counter is at 5.8月 25 19:15:17 ubuntu-rtx3090 systemd[1]: Stopped Light Display Manager.8月 25 19:15:17 ubuntu-rtx3090 systemd[1]: lightdm.service: Start request repeated too quickly.8月 25 19:15:17 ubuntu-rtx3090 systemd[1]: lightdm.service: Failed with result 'exit-code'.8月 25 19:15:17 ubuntu-rtx3090 systemd[1]: Failed to start Light Display Manager.apt policy lightdmlightdm:  Installed: 1.26.0-0ubuntu1  Candidate: 1.26.0-0ubuntu1  Version table: *** 1.26.0-0ubuntu1 500 500 http://mirrors.aliyun.com/ubuntu bionic/universe amd64 Packages 100 /var/lib/dpkg/status root@ubuntu-rtx3090:~# lightdm --test-mode --debugFailed to load configuration from /etc/lightdm/lightdm.conf: Key file does not start with a grouproot@ubuntu-rtx3090:~# lightdm --show-configFailed to load configuration from /etc/lightdm/lightdm.conf: Key file does not start with a group

 

从Failed to load configuration from /etc/lightdm/lightdm.conf: Key file does not start with a group来看/etc/lightdm/lightdm.conf有问题,打开一看,发现只有一行:

     greeter-session=unity-greeter

加上Seat组才是正确的:

    [Seat:*]
    greeter-session=unity-greeter

再执行 lightdm --show-config 就能正常输出了:

root@ubuntu-rtx3090:~# lightdm --show-config   [Seat:*]A  allow-guest=falseC  greeter-wrapper=/usr/lib/lightdm/lightdm-greeter-sessionD  guest-wrapper=/usr/lib/lightdm/lightdm-guest-sessionG  user-session=unityF  greeter-show-manual-login=trueI  greeter-session=unity-greeterF  all-guest=falseH  xserver-command=X -core   [LightDM]B  backup-logs=falseSources:A  /usr/share/lightdm/lightdm.conf.d/50-disable-guest.confB  /usr/share/lightdm/lightdm.conf.d/50-disable-log-backup.confC  /usr/share/lightdm/lightdm.conf.d/50-greeter-wrapper.confD  /usr/share/lightdm/lightdm.conf.d/50-guest-wrapper.confE  /usr/share/lightdm/lightdm.conf.d/50-ubuntu.confF  /usr/share/lightdm/lightdm.conf.d/50-unity-greeter.confG  /usr/share/lightdm/lightdm.conf.d/50-unity.confH  /usr/share/lightdm/lightdm.conf.d/50-xserver-command.confI  /etc/lightdm/lightdm.conf

从上面还可以看出,对于lightdm的多个配置文件的优先级,显然/etc/lightdm/lightdm.conf有最高优先级,它里面的设置覆盖前面的所有配置文件,因为lightdm读取配置文件的顺序是 A->I

再重启lightdm: sudo systemctl restart lightdm,发现服务正常了:

root@ubuntu-rtx3090:~# systemctl status lightdm● lightdm.service - Light Display Manager   Loaded: loaded (/lib/systemd/system/lightdm.service; indirect; vendor preset: enabled)   Active: active (running) since Wed 2021-08-25 19:50:22 CST; 3min 1s ago     Docs: man:lightdm(1)  Process: 1088 ExecStartPre=/bin/sh -c [ "$(basename $(cat /etc/X11/default-display-manager 2>/dev/null))" = "lightdm" ] (code=exited, status=0/SUCCESS) Main PID: 1096 (lightdm)    Tasks: 6 (limit: 4915)   CGroup: /system.slice/lightdm.service    ├─1096 /usr/sbin/lightdm    ├─1115 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch    └─1564 lightdm --session-child 12 198月 25 19:50:21 ubuntu-rtx3090 systemd[1]: Starting Light Display Manager...8月 25 19:50:22 ubuntu-rtx3090 systemd[1]: Started Light Display Manager.8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_kwallet(lightdm-greeter:setcred): (null): pam_sm_setcred8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_kwallet5(lightdm-greeter:setcred): (null): pam_sm_setcred8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_kwallet(lightdm-greeter:session): (null): pam_sm_open_session8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_kwallet(lightdm-greeter:session): pam_kwallet: open_session called without kwallet_key8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_kwallet5(lightdm-greeter:session): (null): pam_sm_open_session8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_kwallet5(lightdm-greeter:session): pam_kwallet5: open_session called without kwallet5_keyroot@ubuntu-rtx3090:~# lightdm --test-mode --debug[+0.00s] DEBUG: Logging to /var/log/lightdm/lightdm.log[+0.00s] DEBUG: Starting Light Display Manager 1.26.0, UID=0 PID=2573[+0.00s] DEBUG: Loading configuration dirs from /var/lib/snapd/desktop/lightdm/lightdm.conf.d[+0.00s] DEBUG: Loading configuration dirs from /usr/share/lightdm/lightdm.conf.d[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-disable-guest.conf[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-disable-log-backup.conf[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-greeter-wrapper.conf[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-guest-wrapper.conf[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-ubuntu.conf[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-unity-greeter.conf[+0.00s] DEBUG:   [Seat:*] contains unknown option all-guest[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-unity.conf[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-xserver-command.conf[+0.00s] DEBUG: Loading configuration dirs from /usr/local/share/lightdm/lightdm.conf.d[+0.00s] DEBUG: Loading configuration dirs from /etc/xdg/lightdm/lightdm.conf.d[+0.00s] DEBUG: Loading configuration from /etc/lightdm/lightdm.conf[+0.00s] DEBUG: Registered seat module local[+0.00s] DEBUG: Registered seat module xremote[+0.00s] DEBUG: Registered seat module unity[+0.00s] DEBUG: Using D-Bus name org.freedesktop.DisplayManager[+0.01s] DEBUG: Monitoring logind for seats[+0.01s] DEBUG: New seat added from logind: seat0[+0.01s] DEBUG: Seat seat0: Loading properties from config section Seat:*[+0.01s] DEBUG: Seat seat0: Starting[+0.01s] DEBUG: Seat seat0: Creating greeter session[+0.01s] DEBUG: Seat seat0: Creating display server of type x[+0.01s] DEBUG: Using VT 7[+0.01s] DEBUG: Seat seat0: Starting local X display on VT 7[+0.01s] DEBUG: XServer 1: Logging to /var/log/lightdm/x-1.log[+0.01s] DEBUG: XServer 1: Writing X server authority to /var/run/lightdm/root/:1[+0.01s] DEBUG: XServer 1: Launching X Server[+0.01s] DEBUG: Launching process 2578: /usr/bin/X -core :1 -seat seat0 -auth /var/run/lightdm/root/:1 -nolisten tcp vt7 -novtswitch[+0.01s] DEBUG: XServer 1: Waiting for ready signal from X server :1[+0.01s] DEBUG: Acquired bus name org.freedesktop.DisplayManager[+0.01s] DEBUG: Registering seat with bus path /org/freedesktop/DisplayManager/Seat0[+0.01s] DEBUG: Loading users from org.freedesktop.Accounts[+0.01s] DEBUG: User /org/freedesktop/Accounts/User1000 addedFailed to use bus name org.freedesktop.DisplayManager, do you have appropriate permissions?

不过登录界面unity-greeter还是没有出来,使用gdm3为Display Manager时gdm3的服务使用systemctl status gdm3 查看也是能正常启动了的,就是登录窗口greeter出不来,像使用lightdm时,最后就是停留在这里:

 折腾了很久,包括安装和在lightdm.conf里配置了lightdm-gtk-greeter和强制设置greeter-show-manual-login=true,还是看不到登录界面出来,

[Seat:*]greeter-session=lightdm-gtk-greetergreeter-show-manual-login=trueallow-guest=false

猜测是不是gdm3和lightdm的greeter窗口在最新的GPU驱动桌linux内核下都不能正常显示,那么我跳过登录让系统自动登录进入桌面,结果如何呢?于是在/etc/lightdm/lightdm.conf里增加一行(我登录的用户名是ubuntu):

     autologin-user=ubuntu

再重启系统,终于能看到久违的unity桌面了!

经试验,下面这些设置有没有都没关系:

autologin-guest=false 
autologin-user-timeout=0
autologin-session=lightdm-autologin

因解决问题中可能需要升级内核版本,附录一下如何安装和删除指定版本的内核及相关命令:

uname -rlsb_release -a#查看当前已经安装的 Kernel Imagedpkg --get-selections |grep linux-image#查询当前软件仓库可以安装的 Kernel Image 版本apt-cache search linux | grep linux-image#安装指定版本的 Kernel Image 和 Kernel Headerapt-get install linux-headers-5.4.0-81-generic linux-image-5.4.0-81-genericBuilding module:cleaning build area...'make' -j24 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-81-generic IGNORE_CC_MISMATCH='' modules.....Signing module: - /var/lib/dkms/nvidia/450.80.02/5.4.0-81-generic/x86_64/module/nvidia-modeset.ko - /var/lib/dkms/nvidia/450.80.02/5.4.0-81-generic/x86_64/module/nvidia-drm.ko - /var/lib/dkms/nvidia/450.80.02/5.4.0-81-generic/x86_64/module/nvidia.ko - /var/lib/dkms/nvidia/450.80.02/5.4.0-81-generic/x86_64/module/nvidia-uvm.koSecure Boot not enabled on this system.cleaning build area...DKMS: build completed.nvidia.ko:Running module version sanity check. - Original module   - No original module exists within this kernel - Installation   - Installing to /lib/modules/5.4.0-81-generic/updates/dkms/nvidia-uvm.ko:Running module version sanity check. - Original module   - No original module exists within this kernel - Installation   - Installing to /lib/modules/5.4.0-81-generic/updates/dkms/nvidia-modeset.ko:Running module version sanity check. - Original module   - No original module exists within this kernel - Installation   - Installing to /lib/modules/5.4.0-81-generic/updates/dkms/nvidia-drm.ko:Running module version sanity check. - Original module   - No original module exists within this kernel - Installation   - Installing to /lib/modules/5.4.0-81-generic/updates/dkms/depmod...DKMS: install completed.   ...done.Processing triggers for linux-image-5.4.0-81-generic (5.4.0-81.91~18.04.1) .../etc/kernel/postinst.d/dkms: * dkms: running auto installation service for kernel 5.4.0-81-generic   ...done./etc/kernel/postinst.d/initramfs-tools:update-initramfs: Generating /boot/initrd.img-5.4.0-81-generic/etc/kernel/postinst.d/zz-update-grub:Sourcing file `/etc/default/grub'      ### 自动执行update-grubGenerating grub configuration file ...Found linux image: /boot/vmlinuz-5.4.0-81-genericFound initrd image: /boot/initrd.img-5.4.0-81-genericFound linux image: /boot/vmlinuz-5.4.0-72-genericFound initrd image: /boot/initrd.img-5.4.0-72-genericFound linux image: /boot/vmlinuz-5.4.0-53-genericFound initrd image: /boot/initrd.img-5.4.0-53-genericAdding boot menu entry for EFI firmware configurationdone查看当前的 Kernel 列表grep menuentry /boot/grub/grub.cfg修改 Kernel 的启动顺序:如果安装的是最新的版本,那么默认就是首选的;如果安装的是旧版本,就需要修改 grub 配置vi /etc/default/grub生效配置update-grub

美国云服务器