Error / Symptoms
A CentOs or RHEL Linux VM is deployed and can be connected to via a remote protocol, but once the NVIDIA vGPU Drivers are added into the VM and it's rebooted the display fails and an error "Oh no! Something has gone wrong!" is seen when connecting via a remote connection.
One possible cause of this error is that SELinux is blocking some aspects of the NVIDIA driver. You can investigate if this is the case by using SSH to access the VM and logging in as root and checking the messages file for errors like this below.
Feb 25 22:51:10 linuxws setroubleshoot: SELinux is preventing /usr/libexec/gnome-session-check-accelerated-helper from getattr access on the chr_file /dev/nvidiactl.
This can happen if SELinux is in enforcing mode. SELinux is a Linux Kernel security module and users should understand its role before changing permissions. See: https://en.wikipedia.org/wiki/Security-Enhanced_Linux.
For some customers it may be acceptable to change their SELinux permissions.
To check current SELinux mode, run "sestatus":
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: permissive
Mode from config file: permissive
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28
Bolded options should be "permissive" to allow NVIDIA drivers to function.
Note: To switch SELinux to permissive mode without reboot (but not permamently) you can use "setenforce"
usage: setenforce [ Enforcing | Permissive | 1 | 0 ]
# setenforce 0
To permanent change SELinux settings
If you see this issue you can modify the SELinux behaviour by editing its configuration.
$ nano /etc/sysconfig/selinux
If you didn't log in as root, you'll need to append sudo to the commands
You should then see the contents of the config file as below.
What we need to do is set the SELINUX= to either disabled, or permissive. Permissive has the advantage that it still continues to log any issues.
Once you've done that Ctrl+X hit Y to save changes, and then Enter to overwrite the old file and you can reboot.
Connecting from then on will take you into the desktop
Where you can confirm the vGPU is attached using
A video overviewing the steps outlined in this article has been provided by NVIDIA engineering: https://youtu.be/rMAvAB_-Z_Y
When editing config files, users should follow best practice and take a backup. Users do this at their own risk and NVIDIA bears no liability or support for errors made during this process.
Customers for whom having SELinux in enforcing mode is essential need to contact their OS vendor and request the version of SElinux in use permits NVIDIA drivers. This issue is documented in the NVIDIA vGPU driver release notes. NVIDIA is tracking this issue under the reference #200167868.
Customer needing to contact Redhat should quote RHEL Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1322283 when asking for a fix.
NVIDIA GRID GPU enabled VMs using GRID cards such as K1, K2, M6, M60, M10
Linux OSs observed affected include CentOs 7.0 and (Redhat) RHEL 7.2 but other versions and OSs may exhibit the same behavior.