cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Huang <Alex.Hu...@citrix.com>
Subject RE: ALARM - ACS reboots host servers!!!
Date Thu, 03 Apr 2014 16:47:22 GMT
This is a severe bug if that's the case.  It's supposed to stop the heartbeat script when a
primary storage is placed in maintenance.

--Alex

> -----Original Message-----
> From: France [mailto:mailinglists@isg.si]
> Sent: Thursday, April 3, 2014 1:06 AM
> To: dev@cloudstack.apache.org
> Subject: Re: ALARM - ACS reboots host servers!!!
> 
> I'm also interested in this issue.
> Can any1 from developers confirm this is expected behavior?
> 
> On 2/4/14 2:32 PM, Andrei Mikhailovsky wrote:
> > Coming back to this issue.
> >
> > This time to perform the maintenance of the nfs primary storage I've
> plated the storage in question in the Maintenance mode. After about 20
> minutes ACS showed the nfs storage is in Maintenance. However, none of
> the virtual machines with volumes on that storage were stopped. I've
> manually stopped the virtual machines and went to upgrade and restart the
> nfs server.
> >
> > A few minutes after the nfs server shutdown all of my host servers went
> into reboot killing all vms!
> >
> > Thus, it seems that putting nfs server in Maintenance mode does not stop
> ACS agent from restarting the host servers.
> >
> > Does anyone know a way to stop this behaviour?
> >
> > Thanks
> >
> > Andrei
> >
> >
> > ----- Original Message -----
> > From: "France" <mailinglists@isg.si>
> > To: users@cloudstack.apache.org
> > Cc: dev@cloudstack.apache.org
> > Sent: Monday, 3 March, 2014 9:49:28 AM
> > Subject: Re: ALARM - ACS reboots host servers!!!
> >
> > I believe this is a bug too, because VMs not running on the storage,
> > get destroyed too:
> >
> > Issue has been around for a long time, like with all others I reported.
> > They do not get fixed:
> > https://issues.apache.org/jira/browse/CLOUDSTACK-3367
> >
> > We even lost assignee today.
> >
> > Regards,
> > F.
> >
> > On 3/3/14 6:55 AM, Koushik Das wrote:
> >> The primary storage needs to be put in maintenance before doing any
> upgrade/reboot as mentioned in the previous mails.
> >>
> >> -Koushik
> >>
> >> On 03-Mar-2014, at 6:07 AM, Marcus <shadowsor@gmail.com> wrote:
> >>
> >>> Also, please note that in the bug you referenced it doesn't have a
> >>> problem with the reboot being triggered, but with the fact that
> >>> reboot never completes due to hanging NFS mount (which is why the
> >>> reboot occurs, inaccessible primary storage).
> >>>
> >>> On Sun, Mar 2, 2014 at 5:26 PM, Marcus <shadowsor@gmail.com> wrote:
> >>>> Or do you mean you have multiple primary storages and this one was
> >>>> not in use and put into maintenance?
> >>>>
> >>>> On Sun, Mar 2, 2014 at 5:25 PM, Marcus <shadowsor@gmail.com>
> wrote:
> >>>>> I'm not sure I understand. How do you expect to reboot your
> >>>>> primary storage while vms are running?  It sounds like the host
is
> >>>>> being fenced since it cannot contact the resources it depends on.
> >>>>>
> >>>>> On Sun, Mar 2, 2014 at 3:24 PM, Nux! <nux@li.nux.ro> wrote:
> >>>>>> On 02.03.2014 21:17, Andrei Mikhailovsky wrote:
> >>>>>>> Hello guys,
> >>>>>>>
> >>>>>>>
> >>>>>>> I've recently came across the bug CLOUDSTACK-5429 which
has
> >>>>>>> rebooted all of my host servers without properly shutting
down the
> guest vms.
> >>>>>>> I've simply upgraded and rebooted one of the nfs primary
storage
> >>>>>>> servers and a few minutes later, to my horror, i've found
out
> >>>>>>> that all of my host servers have been rebooted. Is it just
me
> >>>>>>> thinking so, or is this bug should be fixed ASAP and should
be a
> >>>>>>> blocker for any new ACS release. I mean not only does it
cause
> >>>>>>> downtime, but also possible data loss and server corruption.
> >>>>>> Hi Andrei,
> >>>>>>
> >>>>>> Do you have HA enabled and did you put that primary storage
in
> >>>>>> maintenance mode before rebooting it?
> >>>>>> It's my understanding that ACS relies on the shared storage
to
> >>>>>> perform HA so if the storage goes it's expected to go berserk.
> >>>>>> I've noticed similar behaviour in Xenserver pools without ACS.
> >>>>>> I'd imagine a "cure" for this would be to use network distributed
> >>>>>> "filesystems" like GlusterFS or CEPH.
> >>>>>>
> >>>>>> Lucian
> >>>>>>
> >>>>>> --
> >>>>>> Sent from the Delta quadrant using Borg technology!
> >>>>>>
> >>>>>> Nux!
> >>>>>> www.nux.ro

Mime
View raw message