cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus <shadow...@gmail.com>
Subject Re: Automatic KVM host reboot on Primary Storage failure
Date Fri, 14 Nov 2014 17:18:37 GMT
It is there (I believe) because cloudstack is acting as a cluster manager
for KVM. It is using NFS to determine if it is 'alive' on the network, and
if it is not, it reboots itself to avoid having a split brain scenario
where VMs start coming up on other hosts when they are already running on
this host.  It generally works, if the problem is the host, but as you
point out, there's a situation where the problem can be the NFS server.
This fairly rare for enterprise NFS with high availability, but there are a
fair number of people who have NFS on servers that are relatively low
availability (non-clustered, or get overloaded and unresponsive).

There's plenty of room for improvement in that script, I agree the original
implemention seems fairly rudimentary, but we have to be careful in
thinking about all scenarios and make sure there's no chance of split
brain. In the mean time, one could also partition the resources such that
you have more clusters and only one primary storage per cluster (or
something else, like storage/host tags to guarantee each host only uses one
NFS).

On Fri, Nov 14, 2014 at 8:07 AM, Andrija Panic <andrija.panic@gmail.com>
wrote:

> Hi guys,
>
> I'm wondering why us there a check
> inside
> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh
> ?
>
> I understand that the KVM host checks availability of Primary Storage, and
> reboots itself if it can't write to storage.
>
> But, if we have say, 3 NFS in a cluster, then lot of KVM hosts - 1 primary
> storage going down (server crashing or whatever) - will bring porbably 99%
> of KVM hosts also down for reboot ?
> So instead of loosing uptime for 1/3 of my VMs (1 storage out of 3) - I
> loose uptime for 99%-100% of my VMs ?
>
> I manually edit this script to disabled reboots - but why is it there in
> any case ?
> It doesn't make sense to me - unless I'm mising a point (probably)...
>
> Thanks,
> --
>
> Andrija Panić
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message