Avoid problems when removing datastores from ESXi
Removing a datastore from ESXi seems like such a simple operation, but it is not as straight-forward as it might seem. While it may appear to the virtualization administrator that the process is as simple as right-clicking a datastore and unmounting it, there is actually an often overlooked additional step, which is to actually detach the device from the host as well. This must be done on every individual host prior to unpresenting the LUN from the backend array. Not following the proper procedure can result in "bad things", such as causing an APD on the hosts where the storage was not unmounted AND detached prior to unpresenting it. My guess is that this affects admins who gained most of their experience managing esx 5.0 and above, as some of the checks that used to be manual have now been automated.
A number of steps must be done on every single host that you want to detach the datastore from to avoid problems which could result in a loss of ability to manage your vSphere hosts. The VMs on the hosts may or may not continue to run just fine, assuming that they are not on the LUN which was unpresented, obviously, but you may not be able to manage them. The only recovery solution at that point is to reboot your hosts as hostd may be partially or completely unresponsive.
Let's take a look at what the relevant VMware Knowledge Base article (2004605) states should be checked prior to unmounting a LUN:
Before unmounting a LUN, ensure that:
If the LUN is being used as a VMFS datastore, all objects (such as virtual machines, snapshots, and templates) stored on the VMFS datastore are unregistered or moved to another datastore.Note: All CD/DVD images located on the VMFS datastore must also be unregistered from the virtual machines.
The datastore is not used for vSphere HA heartbeat.
The datastore is not part of a datastore cluster. For more information on datastore clusters, see the vSphere 5.1 Resource Management Guide or vSphere 5.0 Resource Management Guide.
The datastore is not managed by Storage DRS. For more information on Storage DRS, see the vSphere 5.1 Resource Management Guide or vSphere 5.0 Resource Management Guide.
The datastore is not configured as a diagnostic coredump partition. For more information, see Configuring a diagnostic coredump partition on an ESXi 5.x host (2004299).
Storage I/O Control is disabled for the datastore. For more information, see Managing Storage I/O Resources in the vSphere 5.1 Resource Management Guide or vSphere 5.0 Resource Management Guide.
No third-party scripts or utilities running on the ESXi host can access the LUN in question. If the LUN is being used as a datastore, unregister all objects (such as virtual machines and templates) stored on the datastore.
If the LUN is being used as an RDM, remove the RDM from the virtual machine. Click Edit Settings, highlight the RDM hard disk, and click Remove. Select Delete from disk if it is not already selected, and click OK.Note: This destroys the mapping file, but not the LUN content.
Check if the LUN/Datastore is used as the persistent scratch location for the host. For more information on persistent scratch, see Creating a persistent scratch location for ESXi 4.x and 5.x (1033696).
Here are some examples of Datastore unmount check confirmation screens. Notice the first two show a failed check while the last one shows all checks passed.
Once you unmount the datastore that you want to remove unmount the device from (in the vSphere client) Configuration->Storage Adapters->[select storage adapter]:
Verify that the detach device confirmation checklist shows that everything is fine:
Repeat on all the hosts that LUN/datastore is presented to. When all this is done for all hosts you can unpresent the LUN from the backend storage system.
As you can see there are a lot of checks to do before you decide to unmount a LUN or detach a storage device. As I mention above some of these steps are now automated, though detaching the datastore after unmounting is still an additional step. You must also ensure that this is done on all hosts before removing the backing volume. Doing most of these is pretty much second nature at this point though even I am guilty of having missed a step or two at times, and have simply been lucky that I did not get bitten. I have seen others who have not been so lucky and recovery can be painful since it is not possible to vMotion the VMs off the host(s) prior to rebooting.
If you have a large environment or need to unpresent (or present) a large number of LUNs you will find several scripting examples at the bottom of the VMware KB .
This information may be second nature to some, but having seen people get bit by this more than once recently, and having heard from a former VMware tech support engineer that they used to get tickets on this issue pretty much everyday, I figured that it is worth sharing.