cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wido den Hollander (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (CLOUDSTACK-8643) Helper for KVM High Availability
Date Mon, 27 Mar 2017 12:03:41 GMT

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wido den Hollander closed CLOUDSTACK-8643.
------------------------------------------
    Resolution: Won't Fix

> Helper for KVM High Availability
> --------------------------------
>
>                 Key: CLOUDSTACK-8643
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8643
>             Project: CloudStack
>          Issue Type: Improvement
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: KVM, Management Server
>         Environment: KVM hypervisors
>            Reporter: Wido den Hollander
>              Labels: fence, high-availability, kvm, libvirt
>             Fix For: Future
>
>
> When running KVM with NFS storage all Agents will write a heartbeat to the NFS.
> Should a Agent go down, it will still be writing heartbeats even if libvirt has died.
> Using these heartbeats the Management Server can ask other KVM Agents if the other server
is still beating. If not, it can fence it.
> While this works I've also encountered scenarios where you run without NFS and still
want investigators.
> My proposal would be a Agent Helper running NEXT to the Agent it self.
> A simple Python daemon running a Basic HTTP server which queries libvirt every X seconds
about:
> * Running Instances
> * Storage pools
> If keeps this in memory, so that even when libvirt goes down it knows what the last state
was.
> Using the Qemu Monitor sockets we can actually see if the guests we have in memory are
still online.
> If they are we simply keep the list.
> Now, if a investigator comes by and wants to know if the host is still up it can ALSO
ask the helper.
> The management server can ask the helper, but the other agents could as well.
> This doesn't work in all cases, eg where storage is lost. But a additional helper would
be useful to catch scenarios where the Agent itself became unresponsive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message