Friday, 5 August 2016

Error: Acquire running lock failed: 512

Okay so I've been in hiding for quite some time working on Oracle Management Cloud.  I've just returned to start some of my VM's in Oracle VM Manager because I wanted a Windoze box to do some testing and guess what it's broke again !

This time when I tried to start my VM I got the error;

$ xm create /OVS/Repositories/0004fb000003000074aaa55a9bad117a/VirtualMachines/0004fb000006000063e2a5ccff965b20/vm.cfg
Using config file "/OVS/Repositories/0004fb000003000074aaa55a9bad117a/VirtualMachines/0004fb000006000063e2a5ccff965b20/vm.cfg".
Error: Acquire running lock failed: 512

After the customary hacking I managed to find the problem.  I have a three node OVS Cluster and because one of the nodes crashed files were left behind and so it prevented me starting the VM.  The key locations to look for errors are;

/var/log/xen/xend-debug.log
/var/run/ovs-agent/vm-*.lock
/dlm/ovm/*

Basically it looks like the ovs-agents communicate with one another when trying to start a VM and in my case the server on which I tried to start the VM was free of any obvious signs until I tried to start it at which point /var/log/xen/xend-debug.log said;

[Errno 26] Text file busy: '/dlm/ovm/004fb000006000063e2a5ccff965b20'

So I looked for said file and unsurprisingly it was not there, so using brute force I ran the create command again and this time it said;

[Errno 17] File exists: '/var/run/ovs-agent/vm-0004fb000006000063e2a5ccff965b20.lock'

WTF! I thought.

So now I had the above file, which I duly removed and went looking for '/dlm/ovm/004fb000006000063e2a5ccff965b20', which I eventually found on one of the other servers, after removing it and copies of '/var/run/ovs-agent/vm-004fb000006000063e2a5ccff965b20.lock' on each server I was able to start to the VM.

Happy hacking.