cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koushik Das <koushik....@citrix.com>
Subject Re: ALARM - ACS reboots host servers!!!
Date Mon, 03 Mar 2014 05:55:34 GMT
The primary storage needs to be put in maintenance before doing any upgrade/reboot as mentioned
in the previous mails.

-Koushik

On 03-Mar-2014, at 6:07 AM, Marcus <shadowsor@gmail.com> wrote:

> Also, please note that in the bug you referenced it doesn't have a
> problem with the reboot being triggered, but with the fact that reboot
> never completes due to hanging NFS mount (which is why the reboot
> occurs, inaccessible primary storage).
> 
> On Sun, Mar 2, 2014 at 5:26 PM, Marcus <shadowsor@gmail.com> wrote:
>> Or do you mean you have multiple primary storages and this one was not
>> in use and put into maintenance?
>> 
>> On Sun, Mar 2, 2014 at 5:25 PM, Marcus <shadowsor@gmail.com> wrote:
>>> I'm not sure I understand. How do you expect to reboot your primary
>>> storage while vms are running?  It sounds like the host is being
>>> fenced since it cannot contact the resources it depends on.
>>> 
>>> On Sun, Mar 2, 2014 at 3:24 PM, Nux! <nux@li.nux.ro> wrote:
>>>> On 02.03.2014 21:17, Andrei Mikhailovsky wrote:
>>>>> 
>>>>> Hello guys,
>>>>> 
>>>>> 
>>>>> I've recently came across the bug CLOUDSTACK-5429 which has rebooted
>>>>> all of my host servers without properly shutting down the guest vms.
>>>>> I've simply upgraded and rebooted one of the nfs primary storage
>>>>> servers and a few minutes later, to my horror, i've found out that all
>>>>> of my host servers have been rebooted. Is it just me thinking so, or
>>>>> is this bug should be fixed ASAP and should be a blocker for any new
>>>>> ACS release. I mean not only does it cause downtime, but also possible
>>>>> data loss and server corruption.
>>>> 
>>>> 
>>>> Hi Andrei,
>>>> 
>>>> Do you have HA enabled and did you put that primary storage in maintenance
>>>> mode before rebooting it?
>>>> It's my understanding that ACS relies on the shared storage to perform HA
so
>>>> if the storage goes it's expected to go berserk. I've noticed similar
>>>> behaviour in Xenserver pools without ACS.
>>>> I'd imagine a "cure" for this would be to use network distributed
>>>> "filesystems" like GlusterFS or CEPH.
>>>> 
>>>> Lucian
>>>> 
>>>> --
>>>> Sent from the Delta quadrant using Borg technology!
>>>> 
>>>> Nux!
>>>> www.nux.ro


Mime
View raw message