NVIDIA GRID vGPU drivers will fail to load when used with XenServer 7.0 on systems with >512GB of RAM.

Answer ID 4249
Updated 10/25/2016 02:50 PM

NVIDIA GRID vGPU drivers will fail to load when used with XenServer 7.0 on systems with >512GB of RAM.


Symptoms or errors

NVIDIA GRID vGPU drivers will fail to load when used with XenServer 7.0 on systems with >512GB of RAM. Users may experience symptoms such as:

· Error messages such as "NVIDIA Installer cannot continue" and "The graphics driver could not find compatible graphics hardware".

* Within a VM itself, "error 43" may be seen on the display adapter (after installation of Nvidia drivers).

 

Root Cause and Debug

With the release of XenServer 7.0, Citrix XenServer changed their use of IOMMU addressing and as such some customers may have encountered this issue only after upgrading to XenServer 7.0, XenServer 6.5 and earlier had iommu addressing disabled by default.

Workaround / Solution

· For systems with between 512GB and 1TB of RAM, vGPU requires a workaround to config the behavior of iommu addressing to dom0-passthrough:

o Command line: (/opt/xensource/libexec/xen-cmdline --set-dom0 iommu=dom0-passthrough)

· Or by editing the bootloader (/boot/grub/grub.cfg) grub.cfg to contain:

o iommu=Dom0-passthrough

· For systems with >1TB, the workaround doesn't fix the issue (users see runtime failures). The use of NVIDIA GRID cards used with Citrix XenServer and systems with >1TB is unsupported by Citrix and NVIDIA. Users seeking support, for such a system, are advised to contact Citrix and NVIDIA support quoting XenServer engineering reference: NVIDIA-436 or Citrix support reference: SR680982224.

Other links

· Unsupported users wishing to discuss this issue further are encouraged to use the NVIDIA GRID support forums: http://gridforums.nvidia.com

· Most NVIDIA GPUs are limited to 4GB (40-bit) addressing http://us.download.nvidia.com/XFree86/Linux-x86/349.12/README/addressingcapabilities.html

Applicable Products

NVIDIA GRID GPU cards including Kepler and Maxwell cards e.g. K1, K2, M6, M60, M10

Citrix XenServer 7.0 (current release), it is possible this may change in subsequent releases as NVIDIA and Citrix are actively investigating options to resolve this issue without the need for a workaround.

Users of Dell R720 and R730 are advised to ensure their BIOS is up to date as an additional issue in older BIOSs may result in similar symptoms, see: http://nvidia.custhelp.com/app/answers/detail/a_id/4163/~/nvidia-grid-vgpu-on-dell-r730-/-r720-servers,-on-upgrade-to-citrix-xenserver

Was this answer helpful?
Your rating has been submitted, please tell us how we can make this answer more useful.

LIVE CHAT

Chat online with one of our support agents

CHAT NOW

ASK US A QUESTION

Contact Support for assistance

CONTACT US