cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Mabry <>
Subject CS 4.8 KVM VMs will not live migrate
Date Mon, 29 Jan 2018 18:48:52 GMT
Good day Cloudstack Devs,

I've run across a real head scratcher.  I have two VMs, (initially 3 VMs, but more on that
later) on a single host, that I cannot live migrate to any other host in the same cluster.
 We discovered this after attempting to roll out patches going from CentOS 7.2 to CentOS 7.4.
 Initially, we thought it had something to do with the new version of libvirtd or qemu-kvm
on the other hosts in the cluster preventing these VMs from migrating, but we are able to
live migrate other VMs to and from this host without issue.  We can even create new VMs on
this specific host and live migrate them after creation with no issue.  We've put the migration
source agent, migration destination agent and the management server in debug and don't seem
to get anything useful other than "Unsupported command".  Luckily, we did have one VM that
was shutdown and restarted, this is the 3rd VM mentioned above.  Since that VM has been restarted,
it has no issues live migrating to any other host in the cluster.

I'm at a loss as to what to try next and I'm hoping that someone out there might have had
a similar issue and could shed some light on what to do.  Obviously, I can contact the customer
and have them shutdown their VMs, but that will potentially just delay this problem to be
solved another day.  Even if shutting down the VMs is ultimately the solution, I'd still like
to understand what happened to cause this issue in the first place with the hopes of preventing
it in the future.

Here's some information about my setup:
Cloudstack 4.8 Advanced Networking
CentOS 7.2 and 7.4 Hosts
Ceph RBD Primary Storage
NFS Secondary Storage
Instance in Question for Debug: i-532-1392-NSVLTN

I have attached relevant debug logs to this email if anyone wishes to take a look.  I think
the most interesting error message that I have received is the following:

468390:2018-01-27 08:59:35,172 DEBUG [c.c.a.t.Request] (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
ctx-8e7f45ad) (logid:f0888362) Seq 22-942378222027276319: Received:  { Ans: , MgmtId: 14038012703634,
via: 22(, Ver: v1, Flags: 110, { UnsupportedAnswer } }
468391:2018-01-27 08:59:35,172 WARN  [c.c.a.m.AgentManagerImpl] (Work-Job-Executor-6:ctx-188ea30f
job-181792/job-181802 ctx-8e7f45ad) (logid:f0888362) Unsupported Command: Unsupported command
issued:  Are you sure you got the right type
of server?
468392:2018-01-27 08:59:35,179 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-6:ctx-188ea30f
job-181792/job-181802 ctx-8e7f45ad) (logid:f0888362) Invocation exception, caused by:
Resource [Host:22] is unreachable: Host 22: Unable to prepare for migration due to Unsupported
command issued:  Are you sure you got the
right type of server?
468393:2018-01-27 08:59:35,179 INFO  [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-6:ctx-188ea30f
job-181792/job-181802 ctx-8e7f45ad) (logid:f0888362) Rethrow exception
Resource [Host:22] is unreachable: Host 22: Unable to prepare for migration due to Unsupported
command issued:  Are you sure you got the
right type of server?

I've tracked this "Unsupported command" down in the CS 4.8 code to cloudstack/api/src/com/cloud/agent/api/
which is the generic answer class.  I believe where the error is really being spawned from
is cloudstack/engine/orchestration/src/com/cloud/vm/  Specifically:
        Answer pfma = null;
        try {
            pfma = _agentMgr.send(dstHostId, pfmc);
            if (pfma == null || !pfma.getResult()) {
                final String details = pfma != null ? pfma.getDetails() : "null answer returned";
                final String msg = "Unable to prepare for migration due to " + details;
                pfma = null;
                throw new AgentUnavailableException(msg, dstHostId);

The pfma returned must be in error or is never returned and therefore still null.  That answer
appears that it should be coming from the destination agent, but for the life of me I can't
figure out what the root cause of this error is beyond, "Unsupported command issued".  What
command is unsupported?  My guess is that it could be something wrong with the dxml that is
generated and passed to the destination host, but I have as yet been unable to catch that
dxml in debug.

Any help or guidance is greatly appreciated.

David Mabry

View raw message