cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Musayev, Ilya" <imusa...@webmd.net>
Subject RE: Issues when vCenter becomes unavailable
Date Fri, 22 Feb 2013 23:51:19 GMT
Abit Incomplete email as I was in train and mistakenly press send, correction below:.. sorry
:)

-----Original Message-----
From: Musayev, Ilya [mailto:imusayev@webmd.net] 
Sent: Friday, February 22, 2013 6:49 PM
To: cloudstack-dev@incubator.apache.org; cloudstack-users@incubator.apache.org
Cc: Kelven Yang
Subject: RE: Issues when vCenter becomes unavailable

Summary:

I have 3 hypervisors
Hypervisor 1 and 2 are down, hypervisor 3 is up. All VMs live on hypervisor 3, however, the
host_id in instance table for the VMs are not being updated to reflect the only hypervisor
alive.

Details:

I physically powered off 2 hypervisors that had most of my VMs and left 1 online.

The VMs were brought back online by vcenter, however from then on, I experience what Dave
and Andreas mentioned.

That is, VMWare VMs instances are bound to host id (hypervisor) and not vcenter and operations
that would be executed on the VMs require for the hypervisor to stay up. If the hypervisor
goes off line, while VMs still come up in VC, CS cannot comprehend that these VMs now live
on another hypervisor. 

This is bad for production roll outs - because VMs are bound to a hypervisor ID and not virtual
center and it appears its not getting updated - though I do see in the log that CS is trying
to find it.

Did a little more digging, it looks like the host_ids don't get updated in mysql for vm in
instances table. I need to double check on this because I totally messed 2 of test cloudstack
clusters.

Can someone do the following test - if time allows - if not - I can try on monday:

1) Pick a hypervisor for a test crash and note 1 vm (I.e. i-2-89)
2) Navigate to "host" table in mysql and note the host_id for hypervisor that is about to
be powered off.
3) In mysql goto instances table and note the last_host_id and host_id for a VM on test crash
hypervisor.
4) Power off the hypervisor and let VCenter bring it back online
5) Attempt to launch a console on the VM was on crashed hypervisors and was powered back on
by VC
6) If it fails - as it did in my case, alter the value of host_id to a next hypervisor its
living on (my test is not clean because I've ruined the cluster that hosts my console vm and
don't have time now to work on it ATM)
7) Launch console again to see if the issue resolved

I'm under suspicion the host_id does not get updated as I witnessed by examining mysql instance
table, but I need to fix my env issues to confirm.

Regards
ilya


-----Original Message-----
From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
Sent: Friday, February 22, 2013 3:41 PM
To: cloudstack-users@incubator.apache.org
Cc: Kelven Yang; CloudStack DeveloperList
Subject: Re: Issues when vCenter becomes unavailable

CC'ing Kelven to see if he has any ideas.

On 2/22/13 12:22 PM, "Dave Dunaway" <dave.dunaway@gmail.com> wrote:

>If I may suggest also testing a disconnect of a host (hypervisor) from 
>vcenter, so that vcenter and CS can still talk, but vcenter cannot talk 
>to the hosts (hypervisors). CS marks the host as down or failed or whatever.
>
>When the host comes back up vcenter can it just fine and all seems good.
>That however is not the case (I had this with CS 3.0.5 and vmware esxi
>5.0)
>when CS tries to talk to vcenter and the previously disconnected host 
>(that is now recovered).
>
>What we experienced was that we had to migrate all guests off the 
>recovered host, and then destroy that host in CS, and re-create it.
>Then we could migrate back onto it the guests which had been previously 
>migrated.
>
>The curious thing is that while CS did not want to send commands to the 
>host (it kept on saying host id=X has timedout when whatever command 
>was sent to it), CS WAS polling the host for resources and getting the 
>correct numbers.... so CS could in some ways talk to the host (ie: it 
>knew the capabilities, number of VMs on it, etc).
>
>Luckily for me this all happened in a test environment. In production, 
>this would have been a real nightmare!
>
>
>dave
>
>
>On Fri, Feb 22, 2013 at 2:48 PM, Musayev, Ilya <imusayev@webmd.net> wrote:
>
>> Andi
>>
>> I'm on CS4.0. I simulated the VMWare VCenter 5 failure by adding a 
>>bogus  IP entry in /etc/hosts for 10 minutes for virtual center host.
>>That in turn  made VC unreachable by CS.
>>
>> I then began executing commands and sure enough commands failed or 
>> backlogged. Once I restored VC connectivity, the backlogged commands 
>> executed and I did not experience any abnormalities.
>>
>> I will redo this test and leave VC off for an hour - maybe a need a 
>>longer  outage.
>>
>> Regards
>> ilya
>>
>>
>>
>> -----Original Message-----
>> From: Musayev, Ilya
>> Sent: Thursday, February 21, 2013 2:43 PM
>> To: cloudstack-users@incubator.apache.org
>> Subject: RE: Issues when vCenter becomes unavailable
>>
>> This is definitely not the behavior we want with vcenter.
>>
>> I will test this out on my lab setup shortly.
>>
>> Thanks
>> ilya
>>
>> -----Original Message-----
>> From: Chip Childers [mailto:chip.childers@sungard.com]
>> Sent: Thursday, February 21, 2013 9:40 AM
>> To: cloudstack-users@incubator.apache.org
>> Subject: Re: Issues when vCenter becomes unavailable
>>
>> On Thu, Feb 21, 2013 at 08:59:14AM -0500, Mathias Mullins wrote:
>> > Andreas,
>> >
>> > The open source community doesn't support the Citrix version 3.0.6.
>> > You need to report this via your Citrix Support contract. Sounds 
>> > like this could be a bug.
>> >
>> > Community - this could be a possible issue in 4.0.0 / 4.0.1. I 
>> > don't know if this test case has been explored.
>>
>> Thx - I forwarded to cs-dev@i.a.o to get the test engineers in the 
>> community to take a look.
>>
>> >
>> > Thanks,
>> > Matt Mullins
>> > CloudPlatform Implementation Engineer Worldwide Cloud Services 
>> > Citrix System, Inc.
>> > +1 (407) 920-1107  Office/Cell Phone
>> > matt.mullins@citrix.com
>> >
>> >
>> >
>> > On 2/21/13 5:35 AM, "Fuchs, Andreas (SwissTXT)"
>> > <Andreas.Fuchs@swisstxt.ch> wrote:
>> >
>> > >Hi CS Users
>> > >
>> > >We are running CS 3.0.6 on a vSphere platform and found a strange 
>> > >behavior.
>> > >
>> > >When the vCenter becomes unavailable due to a reboot or some other 
>> > >issue, it seems that CS is shutting down instances when vCenter 
>> > >becomes available again.
>> > >
>> > >What we think what happens.
>> > >1. vCenter becomes unrechabale
>> > >2. CS marks the ESX servers as "down"
>> > >3. We think this leads to: CS marks the instances as down as well 4.
>> > >When vCenter becomes available again, CS stops the "marked as down"
>> > >instances
>> > >
>> > >This is very bad as the Instances where running all the time and 
>> > >the the shutdown issued by CS is forcing a service interruption.
>> > >
>> > >My problem is that I cannot realy reporoduce as allot of testing 
>> > >is ongoing on the platform at the moment, so my question:
>> > >
>> > >Does someone else see this issue as well and can maybe reproduce?
>> > >Is there a workaround to it, can I change some flag or something 
>> > >which tells CS to never shut down an instance by himself?
>> > >Why are the ESX hosts getting marked as down and not unreachable 
>> > >or something?
>> > >
>> > >Best regards
>> > >Andi
>> >
>> >
>>
>>
>>






Mime
View raw message