VirtualBacon

Avoid problems when removing datastores from ESXi

vmware_smRemoving a datastore from ESXi seems like such a simple operation, but it is not as straight-forward as it might seem. While it may appear to the virtualization administrator that the process is as simple as right-clicking a datastore and unmounting it, there is actually an often overlooked additional step, which is to actually detach the device from the host as well. This must be done on every individual host prior to unpresenting the LUN from the backend array. Not following the proper procedure can result in "bad things", such as causing an APD on the hosts where the storage was not unmounted AND detached prior to unpresenting it. My guess is that this affects admins who gained most of their experience managing esx 5.0 and above, as some of the checks that used to be manual have now been automated.

A number of steps must be done on every single host that you want to detach the datastore from to avoid problems which could result in a loss of ability to manage your vSphere hosts. The VMs on the hosts may or may not continue to run just fine, assuming that they are not on the LUN which was unpresented, obviously, but you may not be able to manage them. The only recovery solution at that point is to reboot your hosts as hostd may be partially or completely unresponsive.

Let's take a look at what the relevant VMware Knowledge Base article (2004605) states should be checked prior to unmounting a LUN:

 

Before unmounting a LUN, ensure that:

 

Here are some examples of Datastore unmount check confirmation screens. Notice the first two show a failed check while the last one shows all checks passed.

 

Datastore Unmount Confirmation Heartbeat sm

Unmount Datastore Confirmation sm

Datastore Unmount Confirm Green

Once you unmount the datastore that you want to remove unmount the device from (in the vSphere client) Configuration->Storage Adapters->[select storage adapter]:

 

Detach Device sm2

Verify that the detach device confirmation checklist shows that everything is fine:

Device Unmount Confirmation sm

 

Repeat on all the hosts that LUN/datastore is presented to. When all this is done for all hosts you can unpresent the LUN from the backend storage system.

 

As you can see there are a lot of checks to do before you decide to unmount a LUN or detach a storage device. As I mention above some of these steps are now automated, though detaching the datastore after unmounting is still an additional step. You must also ensure that this is done on all hosts before removing the backing volume. Doing most of these is pretty much second nature at this point though even I am guilty of having missed a step or two at times, and have simply been lucky that I did not get bitten. I have seen others who have not been so lucky and recovery can be painful since it is not possible to vMotion the VMs off the host(s) prior to rebooting.

If you have a large environment or need to unpresent (or present) a large number of LUNs you will find several scripting examples at the bottom of the VMware KB .

This information may be second nature to some, but having seen people get bit by this more than once recently, and having heard from a former VMware tech support engineer that they used to get tickets on this issue pretty much everyday, I figured that it is worth sharing.

Website Security Test