Sometimes you need a quick and temporary solution to a problem due to time constraints. Examples that come to mind are putting on a spare tire to get to your following location or using duct tape to fix your glasses. None of these provide a permanent solution, but they accomplish your goal of getting something done quickly.
In this example, I had a small maintenance window to increase the RAM and vCPU count for all our VMs used for VDI due to months-long complaints from the users that their VMs were underpowered. With our workforce working on multiple continents and many time zones, I had a 4-hour window to do the work. With a list just shy of 200 VMs, it was not practical to work manually one VM at a time, and I used the power of PowerShell to accomplish my work. In my mind, while I started the work, the job would be simple: Power off the VMs, run the script to increase both the RAM and vCPU count, then power on the VMs and be done. I estimated 20 minutes or less. Funny how things always don't turn out how you expect them, especially in I.T.
First, to get this out of the way, I did google ways to find out what was causing my disk locks and found multiple articles from VMware, such as https://kb.vmware.com/s/article/10051 and others. The problem was that my maintenance window was quickly disappearing while I ran the commands and searched through logs.
Unfortunately, I don't know if this will work for all situations when something is locking your disk, but it's worth trying since it only takes a few minutes to set up and test it.
Here's the workaround I used to get my work done.
- I powered off the VMs
- I ran my PowerShell script, but some VM would update and others would not. The error I got was that something had the VMDK locked but with no clear message about what process or 3rd party integration could be causing it.
- From previous issues, I knew that restarting the management agents could get ESXi hosts back to running normally, and I thought I'd give it a shot on one ESXi host. It worked! The disks got unlocked.
- Next, I needed a way to quickly restart the management agents across a list of ESXi hosts within my cluster, and MobaXterm quickly came to mind with their "Multi-execution" mode. I had 7 ESXi hosts, and I quickly added them to the tool and enabled Multi-execution to run commands via SSH.
- I then ran the following commands via SSH across all my hosts with MobaXterm (Restarting the Management agents in ESXi):
- /etc/init.d/hostd restart
- /etc/init.d/vpxa restart
- I then reran my PowerShell script across another batch of VMs; most would go through with a few failed ones.
- At this point, I just kept repeating steps 5 and 6 until I could run my script across all the VMs.
Once I have time, I'll circle back and see if I can find the root cause of my disk locks, but until then, this got me through my scheduled maintenance within the allotted time I was given.