cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei ZHOU <ustcweiz...@gmail.com>
Subject Re: CS 4.8 KVM VMs will not live migrate
Date Tue, 30 Jan 2018 14:34:39 GMT
Hi David,

I encountered the UnsupportAnswer once before, when I made some changes in
the kvm plugin.

Normally there should be some network configurations in the agent.log but I
do not see it.

-Wei


2018-01-30 15:00 GMT+01:00 David Mabry <dmabry@ena.com.invalid>:

> Hi Wei,
>
> I detached the iso and received the same error.  Just out of curiosity,
> what leads you to believe it is something in the vxlan code?  I guess at
> this point, attaching a remote debugger to the agent in question might be
> the best way to get to the bottom of what is going on.
>
> Thanks in advance for the help.  I really, really appreciate it.
>
> Thanks,
> David Mabry
>
> ´╗┐On 1/30/18, 3:30 AM, "Wei ZHOU" <ustcweizhou@gmail.com> wrote:
>
>     The answer should be caused by an exception in the cloudstack agent.
>     I tried to migrate a vm in our testing env, it is working.
>
>     there are some different between our env and yours.
>     (1) vlan VS vxlan
>     (2) no ISO VS attached ISO
>     (3) both of us use ceph and centos7.
>
>     I suspect it is caused by codes on vxlan.
>     However, could you detach the ISO and try again ?
>
>     -Wei
>
>
>
>     2018-01-29 19:48 GMT+01:00 David Mabry <dmabry@ena.com.invalid>:
>
>     > Good day Cloudstack Devs,
>     >
>     > I've run across a real head scratcher.  I have two VMs, (initially 3
> VMs,
>     > but more on that later) on a single host, that I cannot live migrate
> to any
>     > other host in the same cluster.  We discovered this after attempting
> to
>     > roll out patches going from CentOS 7.2 to CentOS 7.4.  Initially, we
>     > thought it had something to do with the new version of libvirtd or
> qemu-kvm
>     > on the other hosts in the cluster preventing these VMs from
> migrating, but
>     > we are able to live migrate other VMs to and from this host without
> issue.
>     > We can even create new VMs on this specific host and live migrate
> them
>     > after creation with no issue.  We've put the migration source agent,
>     > migration destination agent and the management server in debug and
> don't
>     > seem to get anything useful other than "Unsupported command".
> Luckily, we
>     > did have one VM that was shutdown and restarted, this is the 3rd VM
>     > mentioned above.  Since that VM has been restarted, it has no issues
> live
>     > migrating to any other host in the cluster.
>     >
>     > I'm at a loss as to what to try next and I'm hoping that someone out
> there
>     > might have had a similar issue and could shed some light on what to
> do.
>     > Obviously, I can contact the customer and have them shutdown their
> VMs, but
>     > that will potentially just delay this problem to be solved another
> day.
>     > Even if shutting down the VMs is ultimately the solution, I'd still
> like to
>     > understand what happened to cause this issue in the first place with
> the
>     > hopes of preventing it in the future.
>     >
>     > Here's some information about my setup:
>     > Cloudstack 4.8 Advanced Networking
>     > CentOS 7.2 and 7.4 Hosts
>     > Ceph RBD Primary Storage
>     > NFS Secondary Storage
>     > Instance in Question for Debug: i-532-1392-NSVLTN
>     >
>     > I have attached relevant debug logs to this email if anyone wishes
> to take
>     > a look.  I think the most interesting error message that I have
> received is
>     > the following:
>     >
>     > 468390:2018-01-27 08:59:35,172 DEBUG [c.c.a.t.Request]
>     > (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
> ctx-8e7f45ad)
>     > (logid:f0888362) Seq 22-942378222027276319: Received:  { Ans: ,
> MgmtId:
>     > 14038012703634, via: 22(csh02c01z01.nsvltn.ena.net), Ver: v1,
> Flags: 110,
>     > { UnsupportedAnswer } }
>     > 468391:2018-01-27 08:59:35,172 WARN  [c.c.a.m.AgentManagerImpl]
>     > (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
> ctx-8e7f45ad)
>     > (logid:f0888362) Unsupported Command: Unsupported command issued:
>     > com.cloud.agent.api.PrepareForMigrationCommand.  Are you sure you
> got the
>     > right type of server?
>     > 468392:2018-01-27 08:59:35,179 ERROR [c.c.v.VmWorkJobHandlerProxy]
>     > (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
> ctx-8e7f45ad)
>     > (logid:f0888362) Invocation exception, caused by:
> com.cloud.exception.AgentUnavailableException:
>     > Resource [Host:22] is unreachable: Host 22: Unable to prepare for
> migration
>     > due to Unsupported command issued: com.cloud.agent.api.
> PrepareForMigrationCommand.
>     > Are you sure you got the right type of server?
>     > 468393:2018-01-27 08:59:35,179 INFO  [c.c.v.VmWorkJobHandlerProxy]
>     > (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
> ctx-8e7f45ad)
>     > (logid:f0888362) Rethrow exception com.cloud.exception.
> AgentUnavailableException:
>     > Resource [Host:22] is unreachable: Host 22: Unable to prepare for
> migration
>     > due to Unsupported command issued: com.cloud.agent.api.
> PrepareForMigrationCommand.
>     > Are you sure you got the right type of server?
>     >
>     > I've tracked this "Unsupported command" down in the CS 4.8 code to
>     > cloudstack/api/src/com/cloud/agent/api/Answer.java which is the
> generic
>     > answer class.  I believe where the error is really being spawned
> from is
>     > cloudstack/engine/orchestration/src/com/cloud/
>     > vm/VirtualMachineManagerImpl.java.  Specifically:
>     >         Answer pfma = null;
>     >         try {
>     >             pfma = _agentMgr.send(dstHostId, pfmc);
>     >             if (pfma == null || !pfma.getResult()) {
>     >                 final String details = pfma != null ?
> pfma.getDetails() :
>     > "null answer returned";
>     >                 final String msg = "Unable to prepare for migration
> due to
>     > " + details;
>     >                 pfma = null;
>     >                 throw new AgentUnavailableException(msg, dstHostId);
>     >             }
>     >
>     > The pfma returned must be in error or is never returned and therefore
>     > still null.  That answer appears that it should be coming from the
>     > destination agent, but for the life of me I can't figure out what
> the root
>     > cause of this error is beyond, "Unsupported command issued".  What
> command
>     > is unsupported?  My guess is that it could be something wrong with
> the dxml
>     > that is generated and passed to the destination host, but I have as
> yet
>     > been unable to catch that dxml in debug.
>     >
>     > Any help or guidance is greatly appreciated.
>     >
>     > Thanks,
>     > David Mabry
>     >
>     >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message