Symptoms or Errors
Blue-screen crashes or other errors can occur on VMs with assigned GPUs if message-signaled interrupt (MSI) is enabled on ESXi 5.1 and 5.5.
Errors may include the following:
· bugcheck 0x116 / BSOD in VM
· bugcheck 0x7E / BSOD in VM
· PSOD (purple screen of death) crashing entire ESXi host
NVIDIA GPUs support two modes of interrupt delivery: legacy interrupt delivery, also known as INTx, and message-signaled interrupt (MSI) delivery. NVIDIA GPUs default to using INTx delivery but the NVIDIA driver may enable MSI by requesting that the operating system enable it. Once in MSI mode, the NVIDIA driver operates differently in terms of how it acknowledges interrupts from the GPU hardware.
When assigning GPUs as passthrough devices to guest VMs, VMware ESXi configures the GPU hardware for MSI delivery, but represents the GPU to the guest OS and driver as being in INTx mode. We refer to this scheme as "MSI translation", because MSIs from the physical hardware are received by the hypervisor and translated to virtual INTx interrupts before being delivered to the guest VM.
NVIDIA considers MSI translation mode to be invalid for NVIDIA GPUs, because the NVIDIA driver's operation must match the delivery mode the physical GPU is using. If the physical GPU is delivering MSI but the NVIDIA driver and guest OS believe it to be using INTx, interrupts from the GPU may be lost, leading to timeouts and bugchecks in guest VMs.
Disable MSI translation on VMware ESXi 5.1 or 5.5. The issue does not appear with ESXi 6.0 and later.
Disabling MSI on VMware ESXi
MSI translation on VMwares ESXi/vSphere hypervisor can be disabled via vCenter or manually by changing a VM's .vmx file to set the flag pciPassthru0.msiEnabled to false. VMware have acknowledged this known issue and provide advice on this process here: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2092964
Under some circumstances, the VM's .vmx file will get rewritten and the msiEnabled setting lost if the VM's settings are reconfigured from vCenter. There are two ways to modify the msiEnabled setting safely to avoid issues with VMs associated with vCenter:
Method 1: Add/Modify the entry via vCenter, through the VM's advanced configuration options.
I. Remove the VM from the vCenter's inventory.
II. Modify the .vmx file as follows:
Note: Make a copy of the file before editing it. Also, make sure the virtual machine is not running while trying to edit one of its components.
1. Open an SSH session to the ESXi host running the virtual machine. For more information, see Using ESXi Shell in ESXi 5.x (2004746).
2. Navigate to the location of the virtual machine: cd /vmfs/volumes/virtual_machine_datastore/virtual_machine_folder/
3. Open the .vmx file of the virtual machine in a text editor. vi virtual_machine.vmx
4. Edit the pciPassthru0.msiEnabled and change the option to false. For more information, see Editing files on an ESX host using vi or nano (1020302).
5. Save the changes and exit the file.
6. Reload the vmx file to apply the changes.
III. After following these instructions, re-add the VM to the vCenter inventory.
This article is not applicable to Citrix XenServer 6.0 and later because it does not use MSI translation.
NVIDIA GRID GPUs used for vDGA/vGPU
Quadro GPUs used for vDGA
VMware vSphere/ESXi 5.1, 5.5