VMware vDGA / GPU passthrough requires MSI be disabled on VMs
Errors may include:
· bugcheck 0x116 / BSOD in VM
· bugcheck 0x7E / BSOD in VM
· PSOD (purple screen of death) crashing entire ESXi host
NVIDIA GPUs support two modes of interrupt delivery: legacy interrupt delivery, also known as INTx, and message-signaled interrupt (MSI) delivery. NVIDIA GPUs default to using INTx delivery but the NVIDIA driver may enable MSI by requesting the Operating System enable it. Once in MSI mode, the NVIDIA driver operates differently in terms of how it acknowledges interrupts from the GPU hardware.
When assigning GPUs as passthrough devices to guest VMs, VMware ESXi configures the GPU hardware for MSI delivery, but represents the GPU to the guest OS and driver as being in INTx mode. We refer to this scheme as "MSI translation", because MSIs from the physical hardware are received by the hypervisor and translated to virtual INTx interrupts before being delivered to the guest VM.
NVIDIA consider MSI translation mode to be invalid for NVIDIA GPUs, because the NVIDIA driver's operation must match the delivery mode the physical GPU is using. If the physical GPU is delivering MSI but the NVIDIA driver and guest OS believe it to be using INTx, interrupts from the GPU may be lost, leading to timeouts and bugchecks in guest VMs.
Disable MSI translation.
MSI translation on VMwares ESXi/vSphere hypervisor can be disabled via vCenter or manually by changing a VM's .vmx file to set the flag pciPassthru0.msiEnabled to false. VMware have acknowledged this known issue and provide advice on this process here: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2092964
VMware currently detail this advice (users should refer to the VMware article at the time of reading). Under some circumstances, the VM's .vmx file will get rewritten and the msiEnabled setting lost, if the VM's settings are reconfigured from vCenter. There are two ways to modify the msiEnabled setting safely to avoid issues with VMs associated with vCenter:
1. Add/Modify the entry via vCenter, via the VM's advanced configuration options.
2. Remove the VM from vCenter's inventory, modify the .vmx file and then add the VM back to vCenter inventory.
If modifying the .vmx file, users should first remove the VM from vCenter's inventory before following the advice in https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2092964 as follows:
To change the virtual machine's vmx file setting pciPassthru0.msiEnabled from true to false:
Note: Make a copy of a file before editing it, and the virtual machine should not be running while trying to edit one of its components.
1. Open an SSH session to the ESXi host running the virtual machine. For more information, see Using ESXi Shell in ESXi 5.x (2004746).
2. Navigate to the location of the virtual machine:
3. Open the .vmx file of the virtual machine in a text editor.
4. Edit the pciPassthru0.msiEnabled and change the option to false. For more information, see Editing files on an ESX host using vi or nano (1020302).
5. Save the changes and exit the file.
6. Reload the vmx file to apply the changes.
Having followed VMware's advice users should then re-add the VM to the vCenter inventory.
Citrix XenServer 6.0 and up does not use MSI translation and therefor this article is not applicable to XenServer.
NVIDIA GRID GPUs used for vDGA/vGPU
Quadro GPUs used for vDGA