cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Sorensen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline
Date Thu, 25 Jul 2013 02:49:48 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719176#comment-13719176
] 

Marcus Sorensen edited comment on CLOUDSTACK-3535 at 7/25/13 2:49 AM:
----------------------------------------------------------------------

Sounds like this is not KVM specific.

Not to be blunt, but I don't think Logan's solution works, at all.  We have no way of knowing
what's running on a host or not, simply by whether or not we can ping it on the management
network. A host may be running with 20 VMs, all healthy, but the management nic went out on
the host. Relying on ping presents too many assumptions (Storage is ethernet based, and the
same interface/network is serving both management and storage).

The only way to go is with proper fencing. For those storage types that support it, revoke
access to other hosts when a VM starts, so that even if it was running elsewhere, you basically
pull the power cord when you start up the VM in the known good location. Meaning that a host
starting a VM has an exclusive lock on the volumes associated with the VM. Additionally/alternatively,
an IPMI service that will power off a host if the agent isn't in maintenance mode and is non-communicative.

In the mean time, like the short term solution mentions, if we can put the host into maintenance
mode manually when it's known-down, and allow vms to migrate, that would at least allow people
to get their system working again without DB hacks.
                
      was (Author: mlsorensen):
    Sounds like this is not KVM specific.

Not to be blunt, but I don't think Logan's solution works, at all.  We have no way of knowing
what's running on a host or not, simply by whether or not we can ping it on the management
network. A host may be running with 20 VMs, all healthy, but the management nic went out on
the host. Relying on ping presents too many assumptions (Storage is ethernet based, and the
same interface/network is serving both management and storage).

The only way to go is with proper fencing. For those storage types that support it, revoke
access to other hosts when a VM starts, so that even if it was running elsewhere, you basically
pull the power cord when you start up the VM in the known good location. Additionally/alternatively,
an IPMI service that will power off a host if the agent isn't in maintenance mode and is non-communicative.

In the mean time, like the short term solution mentions, if we can put the host into maintenance
mode manually when it's known-down, and allow vms to migrate, that would at least allow people
to get their system working again without DB hacks.
                  
> No HA actions are performed when a KVM host goes offline
> --------------------------------------------------------
>
>                 Key: CLOUDSTACK-3535
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Hypervisor Controller, KVM, Management Server
>    Affects Versions: 4.1.0, 4.1.1, 4.2.0
>         Environment: KVM (CentOS 6.3) with CloudStack 4.1
>            Reporter: Paul Angus
>            Priority: Blocker
>
> If a KVM host 'goes down', CloudStack does not perform HA for instances which are marked
as HA enabled on that host (including system VMs)
> CloudStack does not show the host as disconnected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message