NVIDIA GRID vGPU drivers will fail to load when used with XenServer 7.0 on systems with >512GB of RAM. Users may experience symptoms such as:
· Error messages such as "NVIDIA Installer cannot continue" and "The graphics driver could not find compatible graphics hardware".
* Within a VM itself, "error 43" may be seen on the display adapter (after installation of Nvidia drivers).
Root Cause and Debug
With the release of XenServer 7.0, Citrix XenServer changed their use of IOMMU addressing and as such some customers may have encountered this issue only after upgrading to XenServer 7.0, XenServer 6.5 and earlier had iommu addressing disabled by default.
· For systems with between 512GB and 1TB of RAM, vGPU requires a workaround to config the behavior of iommu addressing to dom0-passthrough:
o Command line: (/opt/xensource/libexec/xen-cmdline --set-dom0 iommu=dom0-passthrough)
· Or by editing the bootloader (/boot/grub/grub.cfg) grub.cfg to contain:
· For systems with >1TB, the workaround doesn't fix the issue (users see runtime failures). The use of NVIDIA GRID cards used with Citrix XenServer and systems with >1TB is unsupported by Citrix and NVIDIA. Users seeking support, for such a system, are advised to contact Citrix and NVIDIA support quoting XenServer engineering reference: NVIDIA-436 or Citrix support reference: SR680982224.
· Unsupported users wishing to discuss this issue further are encouraged to use the NVIDIA GRID support forums: http://gridforums.nvidia.com
· Most NVIDIA GPUs are limited to 4GB (40-bit) addressing http://us.download.nvidia.com/XFree86/Linux-x86/349.12/README/addressingcapabilities.html
NVIDIA GRID GPU cards including Kepler and Maxwell cards e.g. K1, K2, M6, M60, M10
Citrix XenServer 7.0 (current release), it is possible this may change in subsequent releases as NVIDIA and Citrix are actively investigating options to resolve this issue without the need for a workaround.
Users of Dell R720 and R730 are advised to ensure their BIOS is up to date as an additional issue in older BIOSs may result in similar symptoms, see: http://nvidia.custhelp.com/app/answers/detail/a_id/4163/~/nvidia-grid-vgpu-on-dell-r730-/-r720-servers,-on-upgrade-to-citrix-xenserver