VMDirectPath is a great new feature with vSphere 4.0, but with new features come new challenges. This page will deal with some common configuration problems that you might encounter.
1) Disable VMDirectPath with a reboot
2) Disable VMDirectPath without a reboot
3) Remove devices from the VMDirectPath Configuration when you're unable to do so with the vSphere client
4) Dealing with the ESXi boot device that has been enabled for VMDirectPath
Disabling VMDirectPath with a reboot
You may encounter a situation where you need to just diable VMDirectPath. If you are able to reboot your host you can set the VMkernel option Boot.noIOMMU to enabled and restart the server. This won't help if you're allocated your boot device to VMDirectPath. For that see this section. When you do this to a host and then go to Configuration \ Hardware \ Advanced Settings, the host will show the message that it does not support VMDirectPath as shown in the second image. If you want to enable VMDirectPath, you can uncheck the option and reboot.
Disable VMDirectPath without a reboot
Should you want to disable VMDirectPath without restarting the host, you can unload (or load) the driver with the below command.
/usr/lib/vmware/vmkmod # vmkload_mod -u vtd Module vtd successfully unloaded /usr/lib/vmware/vmkmod # vmkload_mod vtd Module vtd loaded successfully
When you try to start a VM after you have unloaded the module vtd, you will get an error such as this. You will still be able to configure VMDirectPath devices in Hardware \ Advanced Settings.
Remove devices from VMDirectPath Configuration when you're unable to do so with the vSphere client
A common issue which will hopefully be resolved in an update is the inability to remove a device from passthrough once it has been enabled. You might enable the devices like shown below for the graphics card and NIC, reboot the host and then find that later you can't remove the devices from passthru configuration. When you uncheck the devices and click OK, the devices still appear on the passthru configuration screen and after a reboot the hosts are still shown as enabled. While it is possible to reset the configuration of the host, that may not be the most desirable option.
The configuration for passthru is stored is /etc/vmware/esx.conf. It is possible to edit this file with vifs.pl (supported option) or at the console (unsupported). You'll note that the Pro/1000 MT is listed as vmnic2 and vmnic3 and the video card is not listed. The item listed for passthru is 00:28.05 which lspci shows to be the device "Bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 6". In this case you can change the owner to equal vmkernel, reboot and the devices should no longer be enabled for VMDirectPath. If you're accessing esx.conf at the console you can also run backup.sh 0 /bootbank/ to ensure the change is backed up to the system backup.
When trying to reproduce this problem on the host the PCI bridge would not always be shown in esx.conf, but manually adding the entry and setting the owner to vmkernel was sufficient to clear the devices from being assigned to passthru mode.
/device/000:26.0/owner = "vmkernel" /device/000:26.1/owner = "vmkernel" /device/000:26.2/owner = "vmkernel" /device/000:26.7/owner = "vmkernel" /device/000:28.5/owner = "passthru" /device/000:29.0/owner = "vmkernel" /device/000:29.1/owner = "vmkernel" /device/000:29.2/owner = "vmkernel" /device/000:29.7/owner = "vmkernel" /device/000:31.2/owner = "vmkernel" /device/000:31.2/vmkname = "vmhba0" /device/000:31.5/owner = "vmkernel" /device/000:31.5/vmkname = "vmhba1" /device/006:00.0/vmkname = "vmnic0" /device/007:00.0/vmkname = "vmnic1" /device/008:03.0/vmkname = "vmnic2" /device/008:03.1/vmkname = "vmnic3"
Dealing with the ESXi boot device that has been enabled for VMDirectPath
It is possible to inadvertently enable passthru on the USB device that ESXi to boot from. As shown in the below image all USB devices in the system have been enable for passthru. After a reboot the host will respond boot very slowly and ESXi will not be able to write to the partitions on the USB boot device making the host largely unusable. If you attempt operations at the console, you'll find that no write operations are enabled and the partitions are not mounted properly. If you experience this issue you may find that the console stays at the "dvsfilter loaded successfully" step for several minutes or longer during bootup.
Update: in the forum is a method similar to the below, but it uses a VM to access the boot partitions. You can read about that method here.
# backup.sh 0 /bootbank/ config explicitly loaded Boot partition /bootbank/ cannot be found ~ # ls -l l--------- 0 root root 1984 Jan 1 1970 altbootbank -> /vmfs/volumes/7cfc25ed-9e7abe3b-92b0-4b92d4cdeb1e drwxr-xr-x 1 root root 512 Nov 14 18:26 bin l--------- 0 root root 1984 Jan 1 1970 bootbank -> /vmfs/volumes/53cc92eb-5326110e-6194-a0c34fc4fe34 drwxr-xr-x 1 root root 512 Nov 14 18:48 dev drwxr-xr-x 1 root root 512 Nov 14 18:42 etc drwxr-xr-x 1 root root 512 Nov 14 18:26 lib l--------- 0 root root 1984 Jan 1 1970 locker -> /tmp/scratch drwxr-xr-x 1 root root 512 Sep 17 21:47 opt drwxr-xr-x 1 root root 131072 Nov 14 18:48 proc l--------- 0 root root 1984 Jan 1 1970 productLocker -> /locker/packages/4.0.0/ drwxr-xr-x 1 root root 512 Nov 14 18:26 sbin l--------- 0 root root 1984 Jan 1 1970 scratch -> /tmp/scratch l--------- 0 root root 1984 Jan 1 1970 store -> /vmfs/volumes/efd8efe3-03bc1cbf-15e0-080efd9e7379 drwxrwxrwt 1 root root 512 Nov 14 18:42 tmp drwxr-xr-x 1 root root 512 Nov 14 18:26 usr drwxr-xr-x 1 root root 512 Nov 14 18:26 var drwxr-xr-x 1 root root 512 Nov 14 18:26 vmfs l--------- 0 root root 1984 Jan 1 1970 vmupgrade -> /locker/vmupgrade/
To troubleshoot this problem I tried to both disable IOMMU with a vmkernel boot option and to also disable Intel VT-d in the BIOS but neither option helped to resolve the problem. Now while it may be possible to access the console or SSH, manually editing esx.conf will not help as it can't be written manually or automatically to the /bootbank/ partition as it will no longer be mounted properly after the USB device for the boot device is handed off to VMDirectPath. I figured the best approach would be use this method to edit esx.conf stored in state.tgz to set the owner of affected compoments to vmkernel instead of passthru.
This host did have SSH enabled so I logged in and took a look at the esx.conf file. I found that two devices were enabled for passthru as shown below in the output of esx.conf and lspci. I then booted with the Linux live CD and when I first opened esx.conf I found that the file had no devices were set to passthru as shown in the image below. The problem was that I was looking at the original firmware partition and not the current one.
/advUserOptions/options/name = "CIMWatchdogInterval" /advUserOptions/options/type = "int" /device/000:26.0/owner = "vmkernel" /device/000:26.1/owner = "vmkernel" /device/000:26.2/owner = "vmkernel" /device/000:26.7/owner = "vmkernel" /device/000:27.0/owner = "passthru" /device/000:29.0/owner = "vmkernel" /device/000:29.1/owner = "vmkernel" /device/000:29.2/owner = "vmkernel" /device/000:29.7/owner = "vmkernel" /device/000:30.0/owner = "passthru" /device/000:31.2/owner = "vmkernel" /device/000:31.2/vmkname = "vmhba0" /device/000:31.5/owner = "vmkernel" /device/000:31.5/vmkname = "vmhba1" /device/006:00.0/vmkname = "vmnic0" /device/007:00.0/vmkname = "vmnic1" /device/008:03.0/vmkname = "vmnic2" /device/008:03.1/vmkname = "vmnic3" /net/pnic/child/mac = "00:30:48:db:68:88" /net/pnic/child/name = "vmnic0"
LSPCI OUTPUT 00:27.00 Multimedia controller: Intel Corporation 82801JI (ICH10 Family) HD Audio Controller 00:30.00 Bridge: Intel Corporation 82801BA/CA/DB/EB PCI Bridge
I then looked at the correct file and found that the items highlighted in yellow were set to passthru so I changed them all to the below. After saving the change to local.tgz (it would be state.tgz if you're using ESXi Installable) I rebooted and received an ESXi partition error. I edited the file again this time removing all the lines highlighted in yellow and ESXi booted up without errors. Reviewing my screen shots I found that local.tgz was too large on my first attempt to edit esx.conf so I suspect that caused the partition error and that setting the owner to vmkernel would have worked just as well as removing the entries from the file.
Copyright © 2011 - Dave Mishchenko