VMware vDGA / GPU Passthrough Requires That MSI is Disabled on VMs

Answer ID 4135
Published 05/20/2016 12:36 PM
Updated 12/07/2016 02:02 PM

Blue-screen crashes may occur on VMs with assigned GPUs if MSI is initially enabled for passthrough devices.


Symptoms or Errors

Blue-screen crashes or other errors can occur on VMs with assigned GPUs if message-signaled interrupt (MSI) is enabled on ESXi 5.1 and 5.5. 

Errors may include the following:

· bugcheck 0x116 / BSOD in VM

· bugcheck 0x7E / BSOD in VM

· PSOD (purple screen of death) crashing entire ESXi host

Root Cause

NVIDIA GPUs support two modes of interrupt delivery: legacy interrupt delivery, also known as INTx, and message-signaled interrupt (MSI) delivery. NVIDIA GPUs default to using INTx delivery but the NVIDIA driver may enable MSI by requesting that the operating system enable it. Once in MSI mode, the NVIDIA driver operates differently in terms of how it acknowledges interrupts from the GPU hardware.

When assigning GPUs as passthrough devices to guest VMs, VMware ESXi configures the GPU hardware for MSI delivery, but represents the GPU to the guest OS and driver as being in INTx mode. We refer to this scheme as "MSI translation", because MSIs from the physical hardware are received by the hypervisor and translated to virtual INTx interrupts before being delivered to the guest VM.

NVIDIA considers MSI translation mode to be invalid for NVIDIA GPUs, because the NVIDIA driver's operation must match the delivery mode the physical GPU is using. If the physical GPU is delivering MSI but the NVIDIA driver and guest OS believe it to be using INTx, interrupts from the GPU may be lost, leading to timeouts and bugchecks in guest VMs.

Solution

Disable MSI translation on VMware ESXi 5.1 or 5.5. The issue does not appear with ESXi 6.0 and later. 

Disabling MSI on VMware ESXi

MSI translation on VMwares ESXi/vSphere hypervisor can be disabled via vCenter or manually by changing a VM's .vmx file to set the flag pciPassthru0.msiEnabled to false. VMware have acknowledged this known issue and provide advice on this process here: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2092964

Under some circumstances, the VM's .vmx file will get rewritten and the msiEnabled setting lost if the VM's settings are reconfigured from vCenter. There are two ways to modify the msiEnabled setting safely to avoid issues with VMs associated with vCenter:

Method 1: Add/Modify the entry via vCenter, through the VM's advanced configuration options.

Method 2:

I. Remove the VM from the vCenter's inventory.

II. Modify the .vmx file as follows: 

Note: Make a copy of the file before editing it. Also, make sure the virtual machine is not running while trying to edit one of its components.

1. Open an SSH session to the ESXi host running the virtual machine. For more information, see Using ESXi Shell in ESXi 5.x (2004746).

2. Navigate to the location of the virtual machine:

cd /vmfs/volumes/virtual_machine_datastore/virtual_machine_folder/

3. Open the .vmx file of the virtual machine in a text editor.

vi virtual_machine.vmx

4. Edit the pciPassthru0.msiEnabled and change the option to false. For more information, see Editing files on an ESX host using vi or nano (1020302).

5. Save the changes and exit the file.

6. Reload the vmx file to apply the changes.

III. After following these instructions, re-add the VM to the vCenter inventory.

Citrix XenServer

This article is not applicable to Citrix XenServer 6.0 and later because it does not use MSI translation.

Relevant Products

NVIDIA GRID GPUs used for vDGA/vGPU

Quadro GPUs used for vDGA

VMware vSphere/ESXi 5.1, 5.5

Was this answer helpful?
Your rating has been submitted, please tell us how we can make this answer more useful.

LIVE CHAT

Chat online with one of our support agents

CHAT NOW

ASK US A QUESTION

Contact Support for assistance

CONTACT US