I struggled with this for a few hours hacking around blind, and eventually came up with the answer to my problem, don't ask me how or when but basically I had lost the /etc/ovs-agent/cert directory and all of it's content.
So this is the output from the OVM Job;
Job Construction Phase
----------------------
Job ID: 1409045789587
begin()
Appended operation 'Discover Manager Server Discover' to object 'OVM Foundry : Discover Manager'.
commit()
Completed Step: COMMIT
Objects and Operations
----------------------
Object (IN_USE): [DiscoverManager] OVM Foundry : Discover Manager
Operation: Discover Manager Server Discover
Job Running Phase at 2014-08-26 10:36:29,587
----------------------------------------------
Job Participants: []
Actioner
--------
10:36:30,478: Starting operation 'Discover Manager Server Discover' on object 'OVM Foundry : Discover Manager'
Setting Context to model only in job with id=1409045789587
Job Internal Error (Operation)com.oracle.ovm.mgr.api.exception.FailedOperationException: OVMAPI_4010E Attempt to send command: get_api_version to server: 10.167.242.247 failed. OVMAPI_4004E Server Failed Command: get_api_version , Status: org.apache.xmlrpc.XmlRpcException: I/O error while communicating with HTTP server: Connection refused [Tue Aug 26 10:36:30 BST 2014] [Tue Aug 26 10:36:30 BST 2014]
at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActionEngine.java:513)
at com.oracle.ovm.mgr.action.ActionEngine.sendUndispatchedServerCommand(ActionEngine.java:400)
at com.oracle.ovm.mgr.action.ServerAction.getSupportedApiVersions(ServerAction.java:314)
at com.oracle.ovm.mgr.discover.DiscoverEngine.getServerApiVersions(DiscoverEngine.java:446)
at com.oracle.ovm.mgr.discover.DiscoverEngine.discoverNewServer(DiscoverEngine.java:286)
at com.oracle.ovm.mgr.discover.DiscoverEngine.discoverServer(DiscoverEngine.java:203)
at com.oracle.ovm.mgr.op.manager.DiscoverManagerServerDiscover.action(DiscoverManagerServerDiscover.java:48)
at com.oracle.ovm.mgr.api.collectable.ManagedObjectDbImpl.executeCurrentJobOperationAction(ManagedObjectDbImpl.java:1156)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:356)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:333)
at com.oracle.odof.core.storage.Transaction.invokeMethod(Transaction.java:869)
at com.oracle.odof.core.Exchange.invokeMethod(Exchange.java:244)
at com.oracle.ovm.mgr.api.manager.DiscoverManagerProxy.executeCurrentJobOperationAction(Unknown Source)
at com.oracle.ovm.mgr.api.job.JobEngine.operationActioner(JobEngine.java:230)
at com.oracle.ovm.mgr.api.job.JobEngine.objectActioner(JobEngine.java:322)
at com.oracle.ovm.mgr.api.job.InternalJobDbImpl.objectCommitter(InternalJobDbImpl.java:1383)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:356)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:333)
at com.oracle.odof.core.BasicWork.invokeMethod(BasicWork.java:106)
at com.oracle.odof.command.InvokeMethodCommand.process(InvokeMethodCommand.java:92)
at com.oracle.odof.core.BasicWork.processCommand(BasicWork.java:81)
at com.oracle.odof.core.TransactionManager.processCommand(TransactionManager.java:752)
at com.oracle.odof.core.WorkflowManager.processCommand(WorkflowManager.java:467)
at com.oracle.odof.core.WorkflowManager.processWork(WorkflowManager.java:525)
at com.oracle.odof.io.AbstractClient.run(AbstractClient.java:42)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.oracle.ovm.mgr.api.exception.IllegalOperationException: OVMAPI_4004E Server Failed Command: get_api_version , Status: org.apache.xmlrpc.XmlRpcException: I/O error while communicating with HTTP server: Connection refused [Tue Aug 26 10:36:30 BST 2014]
at com.oracle.ovm.mgr.action.ActionEngine.sendAction(ActionEngine.java:909)
at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActionEngine.java:509)
... 31 more
FailedOperationCleanup
----------
Starting failed operation 'Discover Manager Server Discover' cleanup on object 'OVM Foundry : Discover Manager'
Complete rollback operation 'Discover Manager Server Discover' cleanup on object 'OVM Foundry : Discover Manager'
Rollbacker
----------
10:36:30,778: Starting rollbacker...
Executing rollback operation 'Discover Manager Server Discover' on object 'OVM Foundry : Discover Manager'
Complete rollback operation 'Discover Manager Server Discover' completed with direction=DONE
10:36:30,786: Rollbacker completed...
Objects To Be Rolled Back
-------------------------
Object (IN_USE): [DiscoverManager] OVM Foundry : Discover Manager
Write Methods Invoked
-------------------
10:36:30,314 Class=InternalJobDbImpl vessel_id=7887 method=addTransactionIdentifier accessLevel=6 owningTx=1409045790314
10:36:30,315 Class=DiscoverManagerDbImpl vessel_id=235 method=discoverServer accessLevel=6 owningTx=1409045790314
10:36:30,333 Class=InternalJobDbImpl vessel_id=7887 method=setCompletedStep accessLevel=6 owningTx=1409045790314
10:36:30,333 Class=InternalJobDbImpl vessel_id=7887 method=setAssociatedHandles accessLevel=6 owningTx=1409045790314
10:36:30,590 Class=DiscoverManagerDbImpl vessel_id=235 method=nextJobOperation accessLevel=6 owningTx=1409045790314
10:36:30,590 Class=InternalJobDbImpl vessel_id=7887 method=setFailedOperation accessLevel=6 owningTx=1409045790314
10:36:30,778 Class=DiscoverManagerDbImpl vessel_id=235 method=nextJobOperation accessLevel=6 owningTx=1409045790314
10:36:30,786 Class=DiscoverManagerDbImpl vessel_id=235 method=nextJobOperation accessLevel=6 owningTx=1409045790314
Fat lot of good that did me in finding the problem, so I went to the Oracle Virtual Server and checked the status of the agent and looked at the ovs-agent.log and this is what I found.
[root@someserver ~]# service ovs-agent status
log server (pid 5489) is running...
notification server (pid 5493) is running...
remaster server (pid 5496) is running...
monitor server (pid 5498) is running...
ha server (pid 5499) is running...
stats server (pid 5502) is running...
xmlrpc server dead but pid file exists
ovs-agent.log
[2014-08-26 11:19:10 5493] INFO (notificationserver:213) NOTIFICATION SERVER STARTED[2014-08-26 11:19:10 5496] INFO (remaster:140) REMASTER SERVER STARTED
[2014-08-26 11:19:10 5498] INFO (monitor:23) MONITOR SERVER STARTED
[2014-08-26 11:19:10 5499] INFO (ha:89) HA SERVER STARTED
[2014-08-26 11:19:10 5502] INFO (stats:26) STAT SERVER STARTED
[2014-08-26 11:19:10 5504] ERROR (daemon:131) Error starting xmlrpc server: [('system library', 'fopen', 'No such file or directory'), ('BIO routines', 'FILE_CTRL', 'system lib'), ('SSL routines', 'SSL_CTX_use_PrivateKey_file', 'system lib')]
Traceback (most recent call last):
File "/usr/lib64/python2.4/site-packages/agent/lib/daemon.py", line 129, in run_service
func()
File "/usr/lib64/python2.4/site-packages/agent/daemon/xmlrpc.py", line 305, in serve_forever
AgentXMLRPCRequestHandler, logRequests=False)
File "/usr/lib64/python2.4/site-packages/agent/daemon/xmlrpc.py", line 201, in __init__
ctx.use_privatekey_file(KEYFILE)
Error: [('system library', 'fopen', 'No such file or directory'), ('BIO routines', 'FILE_CTRL', 'system lib'), ('SSL routines', 'SSL_CTX_use_PrivateKey_file', 'system lib')]
[2014-08-26 11:19:10 5502] DEBUG (linuxstats:44) Error getting VM stats: Command: ['xm', 'list', '--long'] failed (1): stderr: Error: Unable to connect to xend: No such file or directory. Is xend running?
stdout:
Still didn't help, which flipping file are we talking about, arrrggghhhh. Right lets get down to xen and see what it can tell me.
[root@someserver xen]# pwd
/var/log/xen
[root@someserver xen]# tail -10 xend-debug.log
Xend started at Mon Aug 4 17:12:33 2014.
Exception starting xend: invalid xend config xend-relocation-server-ssl-key-file: directory '/etc/ovs-agent/cert/key.pem' does not exist
[root@someserver cert]# pwd
/etc/ovs-agent/cert
[root@someserver cert]# ls
Generating RSA private key, 1024 bit long modulus
............++++++
..++++++
e is 65537 (0x10001)
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [GB]:State or Province Name (full name) [Berkshire]:Locality Name (eg, city) [Newbury]:Organization Name (eg, company) [My Company Ltd]:Organizational Unit Name (eg, section) []:Common Name (eg, your name or your server's hostname) []:Email Address []:
Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:An optional company name []:Signature ok
subject=/CN=someserver.com
Getting Private key
[root@someserver cert]# ls
certificate.pem key.pem request.pem