cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nux! <...@li.nux.ro>
Subject Re: slow nfs = reboot all hosts (((
Date Fri, 09 Oct 2015 11:58:19 GMT
Hello,

Instead of commenting 'echo b > /proc/sysrq-trigger' and also disabling your HA at the
same time, perhaps there's a way to tweak the timeouts to be more generous with lazy NFS servers.

Can you go through the logs and see what is happening before the reboot? I am not sure exactly
which timeout the script cares about, worth investigating.

Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

----- Original Message -----
> From: "Andrija Panic" <andrija.panic@gmail.com>
> To: dev@cloudstack.apache.org
> Sent: Friday, 9 October, 2015 10:25:05
> Subject: Re: slow nfs = reboot all hosts (((

> I managed this problem the folowing way:
> http://admintweets.com/cloudstack-disable-agent-rebooting-kvm-host/
> 
> Cheers
> On Oct 9, 2015 10:21 AM, "Andrei Mikhailovsky" <andrei@arhont.com> wrote:
> 
>> Hello
>>
>> My issue is whenever my nfs server becomes slow to respond, ACS just
>> bloody reboots ALL hosts servers, not just the once running vms with
>> volumes attached to the slow nfs server. Recently, i've decided to remove
>> some of the old snapshots to free up some disk space. I've deleted about a
>> dozen snapshots and I was monitoring the nfs server for progress. At no
>> point did the nfs server lost the connectivity, it just became a bit slow
>> and under load. By slow I mean i was still able to list files on the nfs
>> mount point and the ssh session was still working okay. It was just taking
>> a few more seconds to respond when it comes to nfs file listings, creation,
>> deletion, etc. However, the ACS agent has just rebooted every single host
>> server, killing all running guests and system vms. In my case, I only have
>> two guests with volumes on the nfs server. The rest of the vms are running
>> off rbd storage. Yet, all host servers were rebooted, even those which were
>> not running guests with nfs volumes.
>>
>> Ever since i've started using ACS, it was always pretty dumb in correctly
>> determining if the nfs storage is still alive. I would say it has done the
>> maniac reboot everything type of behaviour at least 5 times in the past 3
>> years. So, in the previous versions of ACS i've just modified the
>> kvmheartbeat.sh and hashed out the line with "reboot" as these reboots were
>> just pissing everyone off.
>>
>> After upgrading to ACS 4.5.x that script has no reboot command and I was
>> wondering if it is still possible to instruct the kvmheartbeat script not
>> to reboot the host servers?
>>
>> Thanks for your advice.
>>
>> Andrei

Mime
View raw message