A VM running older NVIDIA drivers, such as those from a previous vGPU release, will fail to initialize vGPU when booted on a XenServer platform running the current release of GRID Virtual GPU Manager.
In this scenario, the VM boots in standard VGA mode with reduced resolution and color depth. The NVIDIA GRID GPU is present in Windows Device Manager but displays a warning sign, and a device status of “Windows has stopped this device because it has reported problems. (Code 43)”.
Depending on the versions of drivers in use, the XenServer /var/log/messages file may contain the following information about the error:
An error message:
vmiop_log: error: Unable to fetch Guest NVIDIA driver information
A report of a version mismatch between guest and host drivers:
vmiop_log: error: Guest VGX version(1.1) and Host VGX version(1.2) do not match
A report of a signature mismatch:
vmiop_log: error: VGPU message signature mismatch.
Install the latest NVIDIA vGPU release drivers in the VM.
GRID K2, Tesla M60, and Tesla M6 support ECC (error correcting code) for improved data integrity. If ECC is enabled, virtual GPU fails to start. The following error is logged in /var/log/messages:
vmiop_log: error: Initialization: VGX not supported with ECC Enabled.
Virtual GPU is not currently supported with ECC active. GRID K2 cards and Tesla M60, M6 cards in graphics mode ship with ECC disabled by default, but ECC may subsequently be enabled using nvidia-smi.
Use nvidia-smi to list the status of all GPUs, and check for ECC noted as enabled on GPUs. Change the ECC status to off on a specific GPU by executing the following command:
nvidia-smi -i id -e 0
id is the index of the GPU as reported by nvidia-smi.
A single vGPU configured on a physical GPU produces lower benchmark scores than the physical GPU run in passthrough mode.
Aside from performance differences that may be attributed to a vGPU’s smaller framebuffer size, vGPU incorporates a performance balancing feature known as Frame Rate Limiter (FRL), which is enabled on all vGPUs. FRL is used to ensure balanced performance across multiple vGPUs that are resident on the same physical GPU. The FRL setting is designed to give good interactive remote graphics experience but may reduce scores in benchmarks that depend on measuring frame rendering rates, as compared to the same benchmarks running on a passthrough GPU.
FRL is controlled by an internal vGPU setting. NVIDIA does not validate vGPU with FRL disabled, but for validation of benchmark performance, FRL can be temporarily disabled by specifying frame_rate_limiter=0 in the VM’s platform:vgpu_extra_args parameter:
[root@xenserver ~]# xe vm-param-set uuid=e71afda4-53f4-3a1b-6c92-a364a7f619c2 platform:vgpu_extra_args="frame_rate_limiter=0"
The setting takes effect the next time the VM is started or rebooted.
With this setting in place, the VM’s vGPU will run without any frame rate limit. The FRL can be reverted back to its default setting by removing the vgpu_extra_args key from the platform parameter, or by removing frame_rate_limiter=0 from the vgpu_extra_args key, or by setting frame_rate_limiter=1. For example:
[root@xenserver ~]# xe vm-param-set uuid=e71afda4-53f4-3a1b-6c92-a364a7f619c2 platform:vgpu_extra_args="frame_rate_limiter=1"
Fixed in XenServer 6.5
GRID vGPU on Citrix XenServer 6.2 does not support operation with GPUs mapped above the 4 gigabyte (4G) boundary in the system’s physical address space.
If GPUs are mapped above 4G, the GRID vGPU Manager rpm will warn at the time of installation:
Warning: vGPU does not support GPUs mapped in 64-bit address space. Please disable 64-bit MMIO from the system's BIOS. Refer to vGPU release notes for details."
Also, the NVIDIA kernel driver will fail to load in XenServer’s dom0, so the nvidia module won’t appear in the module listing produced by lsmod. Additionally, the following warning messages will be present in the output of dmesg:
NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 128M @ 0xf800000000000000 (PCI:03ff:00:07.0)
NVRM: This is a 64-bit BAR mapped above 4GB by the system
NVRM: BIOS or the Linux kernel. The NVIDIA Linux/x86
NVRM: graphics driver and other system software components
NVRM: do not support this configuration.
Ensure that GPUs are mapped below the 4G boundary by disabling your server’s SBIOS option that controls 64-bit memory-mapped I/O support. This option may be labeled Enable à4G Decode or Enable 64-bit MMIO.
If all GPUs in the platform are assigned to VMs in passthrough mode, nvidia-smi will return an error:
[root@xenserver-vgx-test ~]# nvidia-smi
Failed to initialize NVML: Unknown Error
This is because GPUs operating in passthrough mode are not visible to nvidia-smi and the NVIDIA kernel driver operating in XenServer’s dom0.
To confirm that all GPUs are operating in passthrough, use XenCenter’s GPU tab to review current GPU assignment:
Windows Aero may be disabled when XenDesktop is connected to a VM with a vGPU or passthrough GPU, with 3 or 4 monitors at 2560×1600 resolution.
This is a limitation of Windows 7, refer Microsoft’s knowledge base article at https://support.microsoft.com/en-us/kb/2724530.
When starting multiple VMs configured with large amounts of RAM (typically more than 32GB per VM), a VM may fail to initialize vGPU. In this scenario, the VM boots in standard VGA mode with reduced resolution and color depth. The NVIDIA GRID GPU is present in Windows Device Manager but displays a warning sign, and a device status of “Windows has stopped this device because it has reported problems. (Code 43)”.
XenServer’s /var/log/messages contains these error messages:
vmiop_log: error: NVOS status 0x29 vmiop_log: error: Assertion Failed at 0x7620fd4b:179 vmiop_log: error: 8 frames returned by backtrace ... vmiop_log: error: VGPU message 12 failed, result code: 0x29 ... vmiop_log: error: NVOS status 0x8 vmiop_log: error: Assertion Failed at 0x7620c8df:280 vmiop_log: error: 8 frames returned by backtrace ...
vmiop_log: error: VGPU message 26 failed, result code: 0x8
vGPU reserves a portion of the VM’s framebuffer for use in GPU mapping of VM system memory. The reservation is sufficient to support up to 32GB of system memory, and may be increased to accommodate up to 64GB by specifying enable_large_sys_mem=1 in the VM’s platform:vgpu_extra_args parameter:
[root@xenserver ~]# xe vm-param-set uuid=e71afda4-53f4-3a1b-6c92-a364a7f619c2 platform:vgpu_extra_args="enable_large_sys_mem=1"
The setting takes effect the next time the VM is started or rebooted. With this setting in place, less GPU FB is available to applications running in the VM. To accommodate system memory larger than 64GB, the reservation can be further increased by specifying extra_fb_reservation in the VM’s platform:vgpu_extra_args parameter, and settings its value to the desired reservation size in megabytes. The default value of 64M is sufficient to support 64GB of RAM. We recommend adding 2M of reservation for each additional 1GB of system memory. For example, to support 96GB of RAM, set extra_fb_reservation to 128:
The reservation can be reverted back to its default setting by removing the vgpu_extra_args key from the platform parameter, or by removing enable_large_sys_mem from the vgpu_extra_args key, or by setting enable_large_sys_mem=0.
Upgrading vGPU host driver RPM fails with the following message on the console:
[root@xenserver ~]# rpm –U NVIDIA-vGPU-xenserver-6.5-352.46.x86_64.rpm
error: Failed dependencies: NVIDIA-vgx-xenserver conflicts with NVIDIA-vGPU-xenserver-6.5-352.46.x86_64
Uninstall the older vGPU RPM before installing the latest driver.
Use the following command to uninstall the older vGPU RPM:
[root@xenserver ~]# rpm –e NVIDIA-vgx-xenserver