cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrija Panic <andrija.pa...@gmail.com>
Subject Re: ACS 4.5.1 KVM live migration problem
Date Tue, 19 May 2015 09:00:51 GMT
" .I always have vlan values - since this is all Guest NICs - they must
have vlan ID...

NULL values are only present when VM is deleted/stoped in my case...
"

Best
On May 19, 2015 10:49 AM, "Andrei Mikhailovsky" <andrei@arhont.com> wrote:

> Okay, I think i got to the bottom of the problem.
>
> For some reason, after performing a manual db change of the
> isolation/broadcast_uri values in db, the virtual router, responsible for
> the network that was attached to this particular vm, has stopped responding
> and wasn't giving out dhcp leases. After performing the network restart
> with the clean up option, I am able to start vms one again.
>
> After starting the vms, their isolation/broadcast_uri values are populated
> properly.
>
> Andrei
>
>
>
> ----- Original Message -----
>
> From: "Andrei Mikhailovsky" <andrei@arhont.com>
> To: dev@cloudstack.apache.org
> Cc: users@cloudstack.apache.org
> Sent: Tuesday, 19 May, 2015 9:36:05 AM
> Subject: Re: ACS 4.5.1 KVM live migration problem
>
>
>
> Hi guys,
>
> Coming back to the problem with live migration. I've done some more
> testing and I think there is an issue (probably introduced since 4.4.x).
>
> I have manually set the vlan://<number> for the broadcast and isolation
> _uri values in the data base. This has indeed solved the migration problem.
> I am able to migrate vm after making the change.
>
> However, a bigger problem has surfaced. After stopping the vm, I am no
> longer able to start it, even though I've not had any issues
> stopping/starting the vm prior to making db change. I've also noticed that
> after the vm is stopped, the value of both broadcast and isolation URIs is
> reset back to NULL. Not sure if this is the expected behaviour or not.
>
> Could someone help me with getting to the bottom of this issue?
>
> Thanks
>
> Andrei
>
> ----- Original Message -----
>
> From: "Andrija Panic" <andrija.panic@gmail.com>
> To: dev@cloudstack.apache.org
> Cc: users@cloudstack.apache.org
> Sent: Friday, 15 May, 2015 2:01:30 PM
> Subject: Re: ACS 4.5.1 KVM live migration problem
>
> Ok, but since they are guest, it confuses me - is this advanced zone with
> vlan, right ? Then my understanding all NICs (of user VM) needs to have
> some isolation method...
>
> Anyway - I'm running advanced zone + vlans, and all VMS (VMs behind VPC
> and VMS on internet/public network - but still that's Guest network) -
> still all of them have some vlan://xxxxx value.
>
> For VR, SSVM, CPVM - there are NICs on "ACS public" network that doesnt use
> vlan - they have "vlan://untagged", and "NULL" is only used for LinkLocal
> (169.x) NICs, and for mgmt/sec-storage NIC for SSVM/CPVM in my case.
>
>
>
> On 15 May 2015 at 13:47, Andrei Mikhailovsky <andrei@arhont.com> wrote:
>
> > Andrija,
> >
> > I've ran the command and it showed me a bunch of running vms with NULLs.
> I
> > would roughly say about 20% of my total running vms do have NULL under
> the
> > isolation and broadcast URIs.
> >
> > All of these vms are working perfectly well (in terms of network
> > connectivity) and there is nothing special about them. They all have at
> > least one guest NIC.
> >
> > Andrei
> > ----- Original Message -----
> >
> > From: "Andrija Panic" <andrija.panic@gmail.com>
> > To: dev@cloudstack.apache.org
> > Cc: users@cloudstack.apache.org
> > Sent: Friday, 15 May, 2015 12:34:24 PM
> > Subject: Re: ACS 4.5.1 KVM live migration problem
> >
> > Andrei,
> >
> > select instance_id,isolation_uri,broadcast_uri from nics where
> instance_id
> > in (select id from vm_instance where state='Running' and name not like
> > 'r-%' and name not like 'v-%' and name not like 's-%') order by
> > instance_id;
> >
> > This gives me every niC, that does not belong to router or SSVm CPVM....I
> > always have vlan values - since this is all Guest NICs - they must have
> > vlan ID...
> > NULL values are only present when VM is deleted/stoped in my case...
> >
> > Can you check your VM 664 - what is so specific about it ?
> > all NICs (in my understanding, if this is advacned zone) must have some
> > vlan, can not be NULL or untagged ?
> >
> > On 15 May 2015 at 12:58, Andrei Mikhailovsky <andrei@arhont.com> wrote:
> >
> > >
> > >
> > > Hi Andrija, Marcus,
> > >
> > > Thanks for your comments and suggestions. I've checked the cloud.nics
> > table
> > >
> > > mysql> select instance_id,isolation_uri,broadcast_uri from nics where
> > > instance_id=564 or instance_id=664 or instance_id=1111;
> > > +-------------+---------------+---------------+
> > > | instance_id | isolation_uri | broadcast_uri |
> > > +-------------+---------------+---------------+
> > > | 564 | vlan://96 | vlan://96 |
> > > | 664 | NULL | NULL |
> > > | 1111 | vlan://1127 | vlan://1127 |
> > > +-------------+---------------+---------------+
> > >
> > >
> > > From my tests, instance_ids 564 and 1111 are migrating correctly, but
> > > instance 664 is not ans showing the npe similar to the one i've given.
> > >
> > >
> > > Is this what is causing the migration issues? If so, should i change
> all
> > > isolation_uri and broadcast_uri to the corresponding network vlan ids?
> > >
> > > Thanks
> > >
> > > Andrei
> > >
> > > ----- Original Message -----
> > >
> > > From: "Andrija Panic" <andrija.panic@gmail.com>
> > > To: dev@cloudstack.apache.org
> > > Sent: Thursday, 14 May, 2015 4:00:07 PM
> > > Subject: Re: Fwd: ACS 4.5.1 KVM live migration problem
> > >
> > > That would probably be a bug that I had...but we updated main VLAN
> table
> > > with change URI or something... Marcus saved me that time :)
> > > Andrei, please provide more info and the info Marcus said, I will try
> to
> > > compare my values with yours if of any help.
> > >
> > > On 14 May 2015 at 16:56, Marcus <shadowsor@gmail.com> wrote:
> > >
> > > > So, I vaguely remember an issue introduced a little over a year ago
> > where
> > > > the broadcast domain value of the nic was changed from a URI to just
> a
> > > vlan
> > > > ID, which worked for vlans but broke vxlan and some other things. If
> I
> > > > remember correctly, there would be a small set of installs during
> this
> > > > period that wouldn't have created their nics with the correct
> broadcast
> > > > domain value. I don't remember which versions were doing this but I
> do
> > > know
> > > > there's a JIRA ticket and a paper trail on how people were fixing it.
> > The
> > > > code that broke the URI was backed out. VMs created with the bad code
> > > would
> > > > not be compatible with the new or the old versions of code.
> > > >
> > > > I was under the impression at the time that there was some SQL
> provided
> > > to
> > > > update the values during an upgrade, perhaps that never made it in,
> or
> > > > somehow got skipped during your upgrade process. At any rate, since
> > there
> > > > is a null pointer on broadcast domain type, you may check your
> > > > nics/networks the MySQL db and verify that the broadcast/isolation
> > types
> > > > are URI format and not just a number. Or try to find the bug I'm
> > > referring
> > > > to from around April last year.
> > > > On May 14, 2015 5:04 AM, "Andrei Mikhailovsky" <andrei@arhont.com>
> > > wrote:
> > > >
> > > > > Hi guys,
> > > > >
> > > > > Forwarding the message to the dev list as ive not had much reply
in
> > the
> > > > > users list.
> > > > >
> > > > > In summary. after upgrading from ASC4.4.2 ro 4.5.1 i started having
> > > > > migration issues with a lot of vms. some vms are successfully
> > migrating
> > > > and
> > > > > others are not .
> > > > >
> > > > > The logs are shown below
> > > > >
> > > > > could someone help me to get to the bottom of this problem?
> > > > >
> > > > > Thanks
> > > > >
> > > > > Andrei
> > > > >
> > > > >
> > > > >
> > > > > ----- Forwarded Message -----
> > > > > From: "Andrei Mikhailovsky" <andrei@arhont.com>
> > > > > To: users@cloudstack.apache.org
> > > > > Sent: Wednesday, 13 May, 2015 10:44:29 AM
> > > > > Subject: Re: ACS 4.5.1 KVM live migration problem
> > > > >
> > > > > Hi Rohit,
> > > > >
> > > > > forgot to answer you on the cloud.vlan table.
> > > > >
> > > > > That particular vm has a network with vlan id 1151 as shown when
i
> > look
> > > > at
> > > > > the network details in the acs gui. However, this vlan is not shown
> > in
> > > > the
> > > > > cloud.vlan table. From what I can see the cloud.vlan table shows
> only
> > > the
> > > > > public and management network vlan interfaces and does not show the
> > > guest
> > > > > network vlans.
> > > > >
> > > > > In terms of the public network vlan which is used for routing
> traffic
> > > to
> > > > > the internet from this particular vm, it is:
> > > > >
> > > > >
> > > > > mysql> select * from vlan where id=12;
> > > > >
> > > > >
> > > >
> > >
> >
> +----+--------------------------------------+-------------+---------------+-----------------+-------------------------------+----------------+----------------+------------+---------------------+-------------+----------+-----------+---------+---------+
> > > > > | id | uuid | vlan_id | vlan_gateway | vlan_netmask | description
|
> > > > > vlan_type | data_center_id | network_id | physical_network_id |
> > > > ip6_gateway
> > > > > | ip6_cidr | ip6_range | removed | created |
> > > > >
> > > > >
> > > >
> > >
> >
> +----+--------------------------------------+-------------+---------------+-----------------+-------------------------------+----------------+----------------+------------+---------------------+-------------+----------+-----------+---------+---------+
> > > > > | 12 | d13ea4b3-2087-4376-9d0a-f54efe2a55af | vlan://2030 |
> > > 178.XXX.XXX.1
> > > > > | 255.255.255.128 | 178.XXX.XXX.2-178.XXX.XXX.119 | VirtualNetwork
> |
> > 1
> > > |
> > > > > 200 | 200 | NULL | NULL | NULL | NULL | NULL |
> > > > >
> > > > >
> > > >
> > >
> >
> +----+--------------------------------------+-------------+---------------+-----------------+-------------------------------+----------------+----------------+------------+---------------------+-------------+----------+-----------+---------+---------+
> > > > > 1 row in set (0.00 sec)
> > > > >
> > > > >
> > > > > Hope that helps
> > > > >
> > > > > Andrei
> > > > > ----- Original Message -----
> > > > >
> > > > > From: "Rohit Yadav" <rohit.yadav@shapeblue.com>
> > > > > To: users@cloudstack.apache.org
> > > > > Sent: Wednesday, 13 May, 2015 8:55:55 AM
> > > > > Subject: Re: ACS 4.5.1 KVM live migration problem
> > > > >
> > > > > Hi Andrei,
> > > > >
> > > > > This looks like an issue similar to
> > > > > https://issues.apache.org/jira/browse/CLOUDSTACK-6893
> > > > > Can share the row from your cloud.vlan table and value of “select
> > > > > cache_mode from volume_view where vm_id=<put the vm id here>\G;"
> for
> > > the
> > > > VM
> > > > > causing the NPE?
> > > > >
> > > > > > On 12-May-2015, at 10:51 pm, Andrei Mikhailovsky <
> > andrei@arhont.com>
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > It seems that the problem is worse than i've initially thought.
> In
> > > > fact,
> > > > > I can't migrate most of my vms apart from a handful and I can't
> > > > determine a
> > > > > correlation between the migrateable vms and once that produce
> > > exception.
> > > > > >
> > > > > > Thanks for any help.
> > > > > >
> > > > > > Andrei
> > > > > >
> > > > > > ----- Original Message -----
> > > > > >
> > > > > > From: "Andrei Mikhailovsky" <andrei@arhont.com>
> > > > > > To: users@cloudstack.apache.org
> > > > > > Sent: Tuesday, 12 May, 2015 8:53:16 PM
> > > > > > Subject: ACS 4.5.1 KVM live migration problem
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am having an issue migrating some of vms after recently
> upgrading
> > > to
> > > > > ACS 4.5.1. I am running Ubuntu 14.04 on both host and management
> > > servers.
> > > > > Here is the output from the log file on a client agent :
> > > > > >
> > > > > >
> > > > > > 2015-05-12 20:42:34,154 DEBUG
> > [kvm.resource.LibvirtComputingResource]
> > > > > (agentRequest-Handler-1:null) Preparing host for migrating
> > > > > com.cloud.agent.api.to.VirtualMachineTO@21a038ac
> > > > > > 2015-05-12 20:42:34,157 DEBUG [kvm.resource.LibvirtConnection]
> > > > > (agentRequest-Handler-1:null) can't find connection: KVM, for vm:
> > > > > i-9-1162-VM, continue
> > > > > > 2015-05-12 20:42:34,159 DEBUG [kvm.resource.LibvirtConnection]
> > > > > (agentRequest-Handler-1:null) can't find connection: LXC, for vm:
> > > > > i-9-1162-VM, continue
> > > > > > 2015-05-12 20:42:34,159 DEBUG [kvm.resource.LibvirtConnection]
> > > > > (agentRequest-Handler-1:null) can't find which hypervisor the vm
> > used ,
> > > > > then use the default hypervisor
> > > > > > 2015-05-12 20:42:34,160 DEBUG [kvm.resource.BridgeVifDriver]
> > > > > (agentRequest-Handler-1:null)
> > > nic=[Nic:Guest-178.248.108.205-vlan://2014]
> > > > > > 2015-05-12 20:42:34,160 DEBUG [kvm.resource.BridgeVifDriver]
> > > > > (agentRequest-Handler-1:null) creating a vNet dev and bridge for
> > guest
> > > > > traffic per traffic label cloudstackbr0
> > > > > > 2015-05-12 20:42:34,160 DEBUG [kvm.resource.BridgeVifDriver]
> > > > > (agentRequest-Handler-1:null) Executing:
> > > > > /usr/share/cloudstack-common/scripts/vm/network/vnet/modifyvlan.sh
> -v
> > > > 2014
> > > > > -p bond0 -b brbond0-2014 -o add
> > > > > > 2015-05-12 20:42:34,211 DEBUG [kvm.resource.BridgeVifDriver]
> > > > > (agentRequest-Handler-1:null) Execution is successful.
> > > > > > 2015-05-12 20:42:34,211 DEBUG [kvm.resource.BridgeVifDriver]
> > > > > (agentRequest-Handler-1:null) nic=[Nic:Guest-10.1.1.66-null]
> > > > > > 2015-05-12 20:42:34,212 DEBUG [kvm.storage.KVMStoragePoolManager]
> > > > > (agentRequest-Handler-1:null) Disconnecting disk
> > > > > 23add201-e4ee-447b-a448-ecd152aea4ad
> > > > > > 2015-05-12 20:42:34,212 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> > > > > (agentRequest-Handler-1:null) Trying to fetch storage pool
> > > > > cf771bc7-8998-354d-8e10-5564585a3c20 from libvirt
> > > > > > 2015-05-12 20:42:34,223 DEBUG [kvm.storage.KVMStoragePoolManager]
> > > > > (agentRequest-Handler-1:null) Disconnecting disk
> > > > > 55100d25-410e-4fa3-a38b-7717f74d2afe
> > > > > > 2015-05-12 20:42:34,223 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> > > > > (agentRequest-Handler-1:null) Trying to fetch storage pool
> > > > > cf771bc7-8998-354d-8e10-5564585a3c20 from libvirt
> > > > > > 2015-05-12 20:42:34,232 DEBUG [kvm.storage.KVMStoragePoolManager]
> > > > > (agentRequest-Handler-1:null) Disconnecting disk
> > > > > 2db59d16-d17f-49a1-b913-7fbe4025a549
> > > > > > 2015-05-12 20:42:34,233 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> > > > > (agentRequest-Handler-1:null) Trying to fetch storage pool
> > > > > cf771bc7-8998-354d-8e10-5564585a3c20 from libvirt
> > > > > > 2015-05-12 20:42:34,243 DEBUG [kvm.storage.KVMStoragePoolManager]
> > > > > (agentRequest-Handler-1:null) Disconnecting disk
> > > > > 17afbf31-ac89-46f7-a2c8-f8aed796e4c6
> > > > > > 2015-05-12 20:42:34,243 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> > > > > (agentRequest-Handler-1:null) Trying to fetch storage pool
> > > > > d8d5ec36-3cb0-39af-8fc6-084a4abd5d28 from libvirt
> > > > > > 2015-05-12 20:42:34,254 WARN [cloud.agent.Agent]
> > > > > (agentRequest-Handler-1:null) Caught:
> > > > > > java.lang.NullPointerException
> > > > > > at
> > > > >
> > > >
> > >
> >
> com.cloud.network.Networks$BroadcastDomainType.getSchemeValue(Networks.java:172)
> > > > > > at
> > > > >
> > > >
> > >
> >
> com.cloud.network.Networks$BroadcastDomainType.getValue(Networks.java:226)
> > > > > > at
> > > > >
> > > >
> > >
> >
> com.cloud.hypervisor.kvm.resource.BridgeVifDriver.plug(BridgeVifDriver.java:105)
> > > > > > at
> > > > >
> > > >
> > >
> >
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:3230)
> > > > > > at
> > > > >
> > > >
> > >
> >
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1307)
> > > > > > at com.cloud.agent.Agent.processRequest(Agent.java:503)
> > > > > > at
> com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:808)
> > > > > > at com.cloud.utils.nio.Task.run(Task.java:84)
> > > > > > at
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > > > > at
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > > > at java.lang.Thread.run(Thread.java:745)
> > > > > > 2015-05-12 20:42:34,256 DEBUG [cloud.agent.Agent]
> > > > > (agentRequest-Handler-1:null) Seq 7-7525233502359390941: { Ans: ,
> > > MgmtId:
> > > > > 115129173025118, via: 7, Ver: v1, Flags: 110,
> > > > >
> > > >
> > >
> >
> [{"com.cloud.agent.api.Answer":{"result":false,"details":"java.lang.NullPointerException\n\tat
> > > > >
> > > >
> > >
> >
> com.cloud.network.Networks$BroadcastDomainType.getSchemeValue(Networks.java:172)\n\tat
> > > > >
> > > >
> > >
> >
> com.cloud.network.Networks$BroadcastDomainType.getValue(Networks.java:226)\n\tat
> > > > >
> > > >
> > >
> >
> com.cloud.hypervisor.kvm.resource.BridgeVifDriver.plug(BridgeVifDriver.java:105)\n\tat
> > > > >
> > > >
> > >
> >
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:3230)\n\tat
> > > > >
> > > >
> > >
> >
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1307)\n\tat
> > > > > com.cloud.agent.Agent.processRequest(Agent.java:503)\n\tat
> > > > >
> > com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:808)\n\tat
> > > > > com.cloud.utils.nio.Task.run(Task.java:84)\n\tat
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat
> > > > > java.lang.Thread.run(Thread.java:745)\n","wait":0}}] }
> > > > > >
> > > > > >
> > > > > >
> > > > > > Any idea how to get this fixed? Not sure why all of a sudden
the
> > > > > migration stopped working for a handful of vms. I can successfully
> > > > migrate
> > > > > some vms, but not others.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Andrei
> > > > > >
> > > > > >
> > > > >
> > > > > Regards,
> > > > > Rohit Yadav
> > > > > Software Architect, ShapeBlue
> > > > > M. +91 88 262 30892 | rohit.yadav@shapeblue.com
> > > > > Blog: bhaisaab.org | Twitter: @_bhaisaab
> > > > >
> > > > >
> > > > >
> > > > > Find out more about ShapeBlue and our range of CloudStack related
> > > > services
> > > > >
> > > > > IaaS Cloud Design & Build<
> > > > > http://shapeblue.com/iaas-cloud-design-and-build//>
> > > > > CSForge – rapid IaaS deployment framework<
> > > http://shapeblue.com/csforge/>
> > > > > CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/
> >
> > > > > CloudStack Software Engineering<
> > > > > http://shapeblue.com/cloudstack-software-engineering/>
> > > > > CloudStack Infrastructure Support<
> > > > > http://shapeblue.com/cloudstack-infrastructure-support/>
> > > > > CloudStack Bootcamp Training Courses<
> > > > > http://shapeblue.com/cloudstack-training/>
> > > > >
> > > > > This email and any attachments to it may be confidential and are
> > > intended
> > > > > solely for the use of the individual to whom it is addressed. Any
> > views
> > > > or
> > > > > opinions expressed are solely those of the author and do not
> > > necessarily
> > > > > represent those of Shape Blue Ltd or related companies. If you are
> > not
> > > > the
> > > > > intended recipient of this email, you must neither take any action
> > > based
> > > > > upon its contents, nor copy or show it to anyone. Please contact
> the
> > > > sender
> > > > > if you believe you have received this email in error. Shape Blue
> Ltd
> > > is a
> > > > > company incorporated in England & Wales. ShapeBlue Services India
> LLP
> > > is
> > > > a
> > > > > company incorporated in India and is operated under license from
> > Shape
> > > > Blue
> > > > > Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated
> in
> > > > Brasil
> > > > > and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty
> > Ltd
> > > > is
> > > > > a company registered by The Republic of South Africa and is traded
> > > under
> > > > > license from Shape Blue Ltd. ShapeBlue is a registered trademark.
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Andrija Panić
> > >
> > >
> >
> >
> > --
> >
> > Andrija Panić
> >
> >
>
>
> --
>
> Andrija Panić
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message