cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Mikhailovsky <and...@arhont.com>
Subject Re: ALARM - ACS reboots host servers!!!
Date Mon, 03 Mar 2014 12:37:55 GMT

Koushik, I understand that and I will put the storage into the maintenance mode next time.
However, things happen and servers crash from time to time, which is not the reason to reboot
all host servers, even those which do not have any running vms with volumes on the nfs storage.
The bloody agent just rebooted every single host server regardless if they were running vms
with volumes on the rebooted nfs server. 95% of my vms are running from ceph and those should
have never been effected in the first place. 
----- Original Message -----

From: "Koushik Das" <koushik.das@citrix.com> 
To: "<users@cloudstack.apache.org>" <users@cloudstack.apache.org> 
Cc: dev@cloudstack.apache.org 
Sent: Monday, 3 March, 2014 5:55:34 AM 
Subject: Re: ALARM - ACS reboots host servers!!! 

The primary storage needs to be put in maintenance before doing any upgrade/reboot as mentioned
in the previous mails. 

-Koushik 

On 03-Mar-2014, at 6:07 AM, Marcus <shadowsor@gmail.com> wrote: 

> Also, please note that in the bug you referenced it doesn't have a 
> problem with the reboot being triggered, but with the fact that reboot 
> never completes due to hanging NFS mount (which is why the reboot 
> occurs, inaccessible primary storage). 
> 
> On Sun, Mar 2, 2014 at 5:26 PM, Marcus <shadowsor@gmail.com> wrote: 
>> Or do you mean you have multiple primary storages and this one was not 
>> in use and put into maintenance? 
>> 
>> On Sun, Mar 2, 2014 at 5:25 PM, Marcus <shadowsor@gmail.com> wrote: 
>>> I'm not sure I understand. How do you expect to reboot your primary 
>>> storage while vms are running? It sounds like the host is being 
>>> fenced since it cannot contact the resources it depends on. 
>>> 
>>> On Sun, Mar 2, 2014 at 3:24 PM, Nux! <nux@li.nux.ro> wrote: 
>>>> On 02.03.2014 21:17, Andrei Mikhailovsky wrote: 
>>>>> 
>>>>> Hello guys, 
>>>>> 
>>>>> 
>>>>> I've recently came across the bug CLOUDSTACK-5429 which has rebooted

>>>>> all of my host servers without properly shutting down the guest vms.

>>>>> I've simply upgraded and rebooted one of the nfs primary storage 
>>>>> servers and a few minutes later, to my horror, i've found out that all

>>>>> of my host servers have been rebooted. Is it just me thinking so, or

>>>>> is this bug should be fixed ASAP and should be a blocker for any new

>>>>> ACS release. I mean not only does it cause downtime, but also possible

>>>>> data loss and server corruption. 
>>>> 
>>>> 
>>>> Hi Andrei, 
>>>> 
>>>> Do you have HA enabled and did you put that primary storage in maintenance

>>>> mode before rebooting it? 
>>>> It's my understanding that ACS relies on the shared storage to perform HA
so 
>>>> if the storage goes it's expected to go berserk. I've noticed similar 
>>>> behaviour in Xenserver pools without ACS. 
>>>> I'd imagine a "cure" for this would be to use network distributed 
>>>> "filesystems" like GlusterFS or CEPH. 
>>>> 
>>>> Lucian 
>>>> 
>>>> -- 
>>>> Sent from the Delta quadrant using Borg technology! 
>>>> 
>>>> Nux! 
>>>> www.nux.ro 



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message