cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus <shadow...@gmail.com>
Subject Re: ACS 4.5.1 KVM live migration problem
Date Fri, 15 May 2015 15:00:24 GMT
Hmmm, this seems like an unrelated issue, though the culprits are the same
fields.  It has me wondering if there's a bug in the vm sync or network
persistence. It would be interesting to know if:

1) The null values are somehow reproduceable

2) If stopping a VM with null values is possible

3) If starting a vm with null values fixes them

Are the networks these belong to marked as persistent? Network ids can be
dynamic in certain situations, if a network is not used it gives back its
vlan id, then gets a new one when you spin up vms again. This means these
fields on the nic also need to be updated to reflect that, and I'm
wondering if there's some issue there.

On Fri, May 15, 2015 at 6:01 AM, Andrija Panic <andrija.panic@gmail.com>
wrote:

> Ok, but since they are guest, it confuses me - is this advanced zone with
> vlan, right ? Then my understanding all NICs (of user VM) needs to have
> some isolation method...
>
> Anyway - I'm running advanced zone  + vlans, and all VMS (VMs behind VPC
> and VMS on internet/public network - but still that's Guest network) -
> still all of them have some vlan://xxxxx value.
>
> For VR, SSVM, CPVM - there are NICs on "ACS public" network that doesnt use
> vlan - they have "vlan://untagged", and "NULL" is only used for LinkLocal
> (169.x) NICs, and for mgmt/sec-storage NIC for SSVM/CPVM in my case.
>
>
>
> On 15 May 2015 at 13:47, Andrei Mikhailovsky <andrei@arhont.com> wrote:
>
> > Andrija,
> >
> > I've ran the command and it showed me a bunch of running vms with NULLs.
> I
> > would roughly say about 20% of my total running vms do have NULL under
> the
> > isolation and broadcast URIs.
> >
> > All of these vms are working perfectly well (in terms of network
> > connectivity) and there is nothing special about them. They all have at
> > least one guest NIC.
> >
> > Andrei
> > ----- Original Message -----
> >
> > From: "Andrija Panic" <andrija.panic@gmail.com>
> > To: dev@cloudstack.apache.org
> > Cc: users@cloudstack.apache.org
> > Sent: Friday, 15 May, 2015 12:34:24 PM
> > Subject: Re: ACS 4.5.1 KVM live migration problem
> >
> > Andrei,
> >
> > select instance_id,isolation_uri,broadcast_uri from nics where
> instance_id
> > in (select id from vm_instance where state='Running' and name not like
> > 'r-%' and name not like 'v-%' and name not like 's-%') order by
> > instance_id;
> >
> > This gives me every niC, that does not belong to router or SSVm CPVM....I
> > always have vlan values - since this is all Guest NICs - they must have
> > vlan ID...
> > NULL values are only present when VM is deleted/stoped in my case...
> >
> > Can you check your VM 664 - what is so specific about it ?
> > all NICs (in my understanding, if this is advacned zone) must have some
> > vlan, can not be NULL or untagged ?
> >
> > On 15 May 2015 at 12:58, Andrei Mikhailovsky <andrei@arhont.com> wrote:
> >
> > >
> > >
> > > Hi Andrija, Marcus,
> > >
> > > Thanks for your comments and suggestions. I've checked the cloud.nics
> > table
> > >
> > > mysql> select instance_id,isolation_uri,broadcast_uri from nics where
> > > instance_id=564 or instance_id=664 or instance_id=1111;
> > > +-------------+---------------+---------------+
> > > | instance_id | isolation_uri | broadcast_uri |
> > > +-------------+---------------+---------------+
> > > | 564 | vlan://96 | vlan://96 |
> > > | 664 | NULL | NULL |
> > > | 1111 | vlan://1127 | vlan://1127 |
> > > +-------------+---------------+---------------+
> > >
> > >
> > > From my tests, instance_ids 564 and 1111 are migrating correctly, but
> > > instance 664 is not ans showing the npe similar to the one i've given.
> > >
> > >
> > > Is this what is causing the migration issues? If so, should i change
> all
> > > isolation_uri and broadcast_uri to the corresponding network vlan ids?
> > >
> > > Thanks
> > >
> > > Andrei
> > >
> > > ----- Original Message -----
> > >
> > > From: "Andrija Panic" <andrija.panic@gmail.com>
> > > To: dev@cloudstack.apache.org
> > > Sent: Thursday, 14 May, 2015 4:00:07 PM
> > > Subject: Re: Fwd: ACS 4.5.1 KVM live migration problem
> > >
> > > That would probably be a bug that I had...but we updated main VLAN
> table
> > > with change URI or something... Marcus saved me that time :)
> > > Andrei, please provide more info and the info Marcus said, I will try
> to
> > > compare my values with yours if of any help.
> > >
> > > On 14 May 2015 at 16:56, Marcus <shadowsor@gmail.com> wrote:
> > >
> > > > So, I vaguely remember an issue introduced a little over a year ago
> > where
> > > > the broadcast domain value of the nic was changed from a URI to just
> a
> > > vlan
> > > > ID, which worked for vlans but broke vxlan and some other things. If
> I
> > > > remember correctly, there would be a small set of installs during
> this
> > > > period that wouldn't have created their nics with the correct
> broadcast
> > > > domain value. I don't remember which versions were doing this but I
> do
> > > know
> > > > there's a JIRA ticket and a paper trail on how people were fixing it.
> > The
> > > > code that broke the URI was backed out. VMs created with the bad code
> > > would
> > > > not be compatible with the new or the old versions of code.
> > > >
> > > > I was under the impression at the time that there was some SQL
> provided
> > > to
> > > > update the values during an upgrade, perhaps that never made it in,
> or
> > > > somehow got skipped during your upgrade process. At any rate, since
> > there
> > > > is a null pointer on broadcast domain type, you may check your
> > > > nics/networks the MySQL db and verify that the broadcast/isolation
> > types
> > > > are URI format and not just a number. Or try to find the bug I'm
> > > referring
> > > > to from around April last year.
> > > > On May 14, 2015 5:04 AM, "Andrei Mikhailovsky" <andrei@arhont.com>
> > > wrote:
> > > >
> > > > > Hi guys,
> > > > >
> > > > > Forwarding the message to the dev list as ive not had much reply
in
> > the
> > > > > users list.
> > > > >
> > > > > In summary. after upgrading from ASC4.4.2 ro 4.5.1 i started having
> > > > > migration issues with a lot of vms. some vms are successfully
> > migrating
> > > > and
> > > > > others are not .
> > > > >
> > > > > The logs are shown below
> > > > >
> > > > > could someone help me to get to the bottom of this problem?
> > > > >
> > > > > Thanks
> > > > >
> > > > > Andrei
> > > > >
> > > > >
> > > > >
> > > > > ----- Forwarded Message -----
> > > > > From: "Andrei Mikhailovsky" <andrei@arhont.com>
> > > > > To: users@cloudstack.apache.org
> > > > > Sent: Wednesday, 13 May, 2015 10:44:29 AM
> > > > > Subject: Re: ACS 4.5.1 KVM live migration problem
> > > > >
> > > > > Hi Rohit,
> > > > >
> > > > > forgot to answer you on the cloud.vlan table.
> > > > >
> > > > > That particular vm has a network with vlan id 1151 as shown when
i
> > look
> > > > at
> > > > > the network details in the acs gui. However, this vlan is not shown
> > in
> > > > the
> > > > > cloud.vlan table. From what I can see the cloud.vlan table shows
> only
> > > the
> > > > > public and management network vlan interfaces and does not show the
> > > guest
> > > > > network vlans.
> > > > >
> > > > > In terms of the public network vlan which is used for routing
> traffic
> > > to
> > > > > the internet from this particular vm, it is:
> > > > >
> > > > >
> > > > > mysql> select * from vlan where id=12;
> > > > >
> > > > >
> > > >
> > >
> >
> +----+--------------------------------------+-------------+---------------+-----------------+-------------------------------+----------------+----------------+------------+---------------------+-------------+----------+-----------+---------+---------+
> > > > > | id | uuid | vlan_id | vlan_gateway | vlan_netmask | description
|
> > > > > vlan_type | data_center_id | network_id | physical_network_id |
> > > > ip6_gateway
> > > > > | ip6_cidr | ip6_range | removed | created |
> > > > >
> > > > >
> > > >
> > >
> >
> +----+--------------------------------------+-------------+---------------+-----------------+-------------------------------+----------------+----------------+------------+---------------------+-------------+----------+-----------+---------+---------+
> > > > > | 12 | d13ea4b3-2087-4376-9d0a-f54efe2a55af | vlan://2030 |
> > > 178.XXX.XXX.1
> > > > > | 255.255.255.128 | 178.XXX.XXX.2-178.XXX.XXX.119 | VirtualNetwork
> |
> > 1
> > > |
> > > > > 200 | 200 | NULL | NULL | NULL | NULL | NULL |
> > > > >
> > > > >
> > > >
> > >
> >
> +----+--------------------------------------+-------------+---------------+-----------------+-------------------------------+----------------+----------------+------------+---------------------+-------------+----------+-----------+---------+---------+
> > > > > 1 row in set (0.00 sec)
> > > > >
> > > > >
> > > > > Hope that helps
> > > > >
> > > > > Andrei
> > > > > ----- Original Message -----
> > > > >
> > > > > From: "Rohit Yadav" <rohit.yadav@shapeblue.com>
> > > > > To: users@cloudstack.apache.org
> > > > > Sent: Wednesday, 13 May, 2015 8:55:55 AM
> > > > > Subject: Re: ACS 4.5.1 KVM live migration problem
> > > > >
> > > > > Hi Andrei,
> > > > >
> > > > > This looks like an issue similar to
> > > > > https://issues.apache.org/jira/browse/CLOUDSTACK-6893
> > > > > Can share the row from your cloud.vlan table and value of “select
> > > > > cache_mode from volume_view where vm_id=<put the vm id here>\G;"
> for
> > > the
> > > > VM
> > > > > causing the NPE?
> > > > >
> > > > > > On 12-May-2015, at 10:51 pm, Andrei Mikhailovsky <
> > andrei@arhont.com>
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > It seems that the problem is worse than i've initially thought.
> In
> > > > fact,
> > > > > I can't migrate most of my vms apart from a handful and I can't
> > > > determine a
> > > > > correlation between the migrateable vms and once that produce
> > > exception.
> > > > > >
> > > > > > Thanks for any help.
> > > > > >
> > > > > > Andrei
> > > > > >
> > > > > > ----- Original Message -----
> > > > > >
> > > > > > From: "Andrei Mikhailovsky" <andrei@arhont.com>
> > > > > > To: users@cloudstack.apache.org
> > > > > > Sent: Tuesday, 12 May, 2015 8:53:16 PM
> > > > > > Subject: ACS 4.5.1 KVM live migration problem
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am having an issue migrating some of vms after recently
> upgrading
> > > to
> > > > > ACS 4.5.1. I am running Ubuntu 14.04 on both host and management
> > > servers.
> > > > > Here is the output from the log file on a client agent :
> > > > > >
> > > > > >
> > > > > > 2015-05-12 20:42:34,154 DEBUG
> > [kvm.resource.LibvirtComputingResource]
> > > > > (agentRequest-Handler-1:null) Preparing host for migrating
> > > > > com.cloud.agent.api.to.VirtualMachineTO@21a038ac
> > > > > > 2015-05-12 20:42:34,157 DEBUG [kvm.resource.LibvirtConnection]
> > > > > (agentRequest-Handler-1:null) can't find connection: KVM, for vm:
> > > > > i-9-1162-VM, continue
> > > > > > 2015-05-12 20:42:34,159 DEBUG [kvm.resource.LibvirtConnection]
> > > > > (agentRequest-Handler-1:null) can't find connection: LXC, for vm:
> > > > > i-9-1162-VM, continue
> > > > > > 2015-05-12 20:42:34,159 DEBUG [kvm.resource.LibvirtConnection]
> > > > > (agentRequest-Handler-1:null) can't find which hypervisor the vm
> > used ,
> > > > > then use the default hypervisor
> > > > > > 2015-05-12 20:42:34,160 DEBUG [kvm.resource.BridgeVifDriver]
> > > > > (agentRequest-Handler-1:null)
> > > nic=[Nic:Guest-178.248.108.205-vlan://2014]
> > > > > > 2015-05-12 20:42:34,160 DEBUG [kvm.resource.BridgeVifDriver]
> > > > > (agentRequest-Handler-1:null) creating a vNet dev and bridge for
> > guest
> > > > > traffic per traffic label cloudstackbr0
> > > > > > 2015-05-12 20:42:34,160 DEBUG [kvm.resource.BridgeVifDriver]
> > > > > (agentRequest-Handler-1:null) Executing:
> > > > > /usr/share/cloudstack-common/scripts/vm/network/vnet/modifyvlan.sh
> -v
> > > > 2014
> > > > > -p bond0 -b brbond0-2014 -o add
> > > > > > 2015-05-12 20:42:34,211 DEBUG [kvm.resource.BridgeVifDriver]
> > > > > (agentRequest-Handler-1:null) Execution is successful.
> > > > > > 2015-05-12 20:42:34,211 DEBUG [kvm.resource.BridgeVifDriver]
> > > > > (agentRequest-Handler-1:null) nic=[Nic:Guest-10.1.1.66-null]
> > > > > > 2015-05-12 20:42:34,212 DEBUG [kvm.storage.KVMStoragePoolManager]
> > > > > (agentRequest-Handler-1:null) Disconnecting disk
> > > > > 23add201-e4ee-447b-a448-ecd152aea4ad
> > > > > > 2015-05-12 20:42:34,212 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> > > > > (agentRequest-Handler-1:null) Trying to fetch storage pool
> > > > > cf771bc7-8998-354d-8e10-5564585a3c20 from libvirt
> > > > > > 2015-05-12 20:42:34,223 DEBUG [kvm.storage.KVMStoragePoolManager]
> > > > > (agentRequest-Handler-1:null) Disconnecting disk
> > > > > 55100d25-410e-4fa3-a38b-7717f74d2afe
> > > > > > 2015-05-12 20:42:34,223 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> > > > > (agentRequest-Handler-1:null) Trying to fetch storage pool
> > > > > cf771bc7-8998-354d-8e10-5564585a3c20 from libvirt
> > > > > > 2015-05-12 20:42:34,232 DEBUG [kvm.storage.KVMStoragePoolManager]
> > > > > (agentRequest-Handler-1:null) Disconnecting disk
> > > > > 2db59d16-d17f-49a1-b913-7fbe4025a549
> > > > > > 2015-05-12 20:42:34,233 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> > > > > (agentRequest-Handler-1:null) Trying to fetch storage pool
> > > > > cf771bc7-8998-354d-8e10-5564585a3c20 from libvirt
> > > > > > 2015-05-12 20:42:34,243 DEBUG [kvm.storage.KVMStoragePoolManager]
> > > > > (agentRequest-Handler-1:null) Disconnecting disk
> > > > > 17afbf31-ac89-46f7-a2c8-f8aed796e4c6
> > > > > > 2015-05-12 20:42:34,243 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> > > > > (agentRequest-Handler-1:null) Trying to fetch storage pool
> > > > > d8d5ec36-3cb0-39af-8fc6-084a4abd5d28 from libvirt
> > > > > > 2015-05-12 20:42:34,254 WARN [cloud.agent.Agent]
> > > > > (agentRequest-Handler-1:null) Caught:
> > > > > > java.lang.NullPointerException
> > > > > > at
> > > > >
> > > >
> > >
> >
> com.cloud.network.Networks$BroadcastDomainType.getSchemeValue(Networks.java:172)
> > > > > > at
> > > > >
> > > >
> > >
> >
> com.cloud.network.Networks$BroadcastDomainType.getValue(Networks.java:226)
> > > > > > at
> > > > >
> > > >
> > >
> >
> com.cloud.hypervisor.kvm.resource.BridgeVifDriver.plug(BridgeVifDriver.java:105)
> > > > > > at
> > > > >
> > > >
> > >
> >
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:3230)
> > > > > > at
> > > > >
> > > >
> > >
> >
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1307)
> > > > > > at com.cloud.agent.Agent.processRequest(Agent.java:503)
> > > > > > at
> com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:808)
> > > > > > at com.cloud.utils.nio.Task.run(Task.java:84)
> > > > > > at
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > > > > at
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > > > at java.lang.Thread.run(Thread.java:745)
> > > > > > 2015-05-12 20:42:34,256 DEBUG [cloud.agent.Agent]
> > > > > (agentRequest-Handler-1:null) Seq 7-7525233502359390941: { Ans: ,
> > > MgmtId:
> > > > > 115129173025118, via: 7, Ver: v1, Flags: 110,
> > > > >
> > > >
> > >
> >
> [{"com.cloud.agent.api.Answer":{"result":false,"details":"java.lang.NullPointerException\n\tat
> > > > >
> > > >
> > >
> >
> com.cloud.network.Networks$BroadcastDomainType.getSchemeValue(Networks.java:172)\n\tat
> > > > >
> > > >
> > >
> >
> com.cloud.network.Networks$BroadcastDomainType.getValue(Networks.java:226)\n\tat
> > > > >
> > > >
> > >
> >
> com.cloud.hypervisor.kvm.resource.BridgeVifDriver.plug(BridgeVifDriver.java:105)\n\tat
> > > > >
> > > >
> > >
> >
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:3230)\n\tat
> > > > >
> > > >
> > >
> >
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1307)\n\tat
> > > > > com.cloud.agent.Agent.processRequest(Agent.java:503)\n\tat
> > > > >
> > com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:808)\n\tat
> > > > > com.cloud.utils.nio.Task.run(Task.java:84)\n\tat
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat
> > > > > java.lang.Thread.run(Thread.java:745)\n","wait":0}}] }
> > > > > >
> > > > > >
> > > > > >
> > > > > > Any idea how to get this fixed? Not sure why all of a sudden
the
> > > > > migration stopped working for a handful of vms. I can successfully
> > > > migrate
> > > > > some vms, but not others.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Andrei
> > > > > >
> > > > > >
> > > > >
> > > > > Regards,
> > > > > Rohit Yadav
> > > > > Software Architect, ShapeBlue
> > > > > M. +91 88 262 30892 | rohit.yadav@shapeblue.com
> > > > > Blog: bhaisaab.org | Twitter: @_bhaisaab
> > > > >
> > > > >
> > > > >
> > > > > Find out more about ShapeBlue and our range of CloudStack related
> > > > services
> > > > >
> > > > > IaaS Cloud Design & Build<
> > > > > http://shapeblue.com/iaas-cloud-design-and-build//>
> > > > > CSForge – rapid IaaS deployment framework<
> > > http://shapeblue.com/csforge/>
> > > > > CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/
> >
> > > > > CloudStack Software Engineering<
> > > > > http://shapeblue.com/cloudstack-software-engineering/>
> > > > > CloudStack Infrastructure Support<
> > > > > http://shapeblue.com/cloudstack-infrastructure-support/>
> > > > > CloudStack Bootcamp Training Courses<
> > > > > http://shapeblue.com/cloudstack-training/>
> > > > >
> > > > > This email and any attachments to it may be confidential and are
> > > intended
> > > > > solely for the use of the individual to whom it is addressed. Any
> > views
> > > > or
> > > > > opinions expressed are solely those of the author and do not
> > > necessarily
> > > > > represent those of Shape Blue Ltd or related companies. If you are
> > not
> > > > the
> > > > > intended recipient of this email, you must neither take any action
> > > based
> > > > > upon its contents, nor copy or show it to anyone. Please contact
> the
> > > > sender
> > > > > if you believe you have received this email in error. Shape Blue
> Ltd
> > > is a
> > > > > company incorporated in England & Wales. ShapeBlue Services India
> LLP
> > > is
> > > > a
> > > > > company incorporated in India and is operated under license from
> > Shape
> > > > Blue
> > > > > Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated
> in
> > > > Brasil
> > > > > and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty
> > Ltd
> > > > is
> > > > > a company registered by The Republic of South Africa and is traded
> > > under
> > > > > license from Shape Blue Ltd. ShapeBlue is a registered trademark.
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Andrija Panić
> > >
> > >
> >
> >
> > --
> >
> > Andrija Panić
> >
> >
>
>
> --
>
> Andrija Panić
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message