cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lennert den Teuling (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline
Date Mon, 05 Aug 2013 12:11:47 GMT

    [ https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729442#comment-13729442
] 

Lennert den Teuling edited comment on CLOUDSTACK-3535 at 8/5/13 12:10 PM:
--------------------------------------------------------------------------

This is the code that is responsible for nothing to happen (UserVmDomRInvestigator.java)

        if (s_logger.isDebugEnabled()) {
            s_logger.debug("could not reach agent, could not reach agent's host, returning
that we don't have enough information");
        }
        return null;

I think because null is returned nothing happens, so i've replaced this simply with "Status.Down"
and the HA works fine.

Maybe I'm looking at this issue to simple, but why would a unreachable agent and an unpingable
host not be enough to trigger HA? The only logical reason i could think of, is that when network
issues occur ugly things could happen. But there still is the KVMHAChecker which uses the
filesystem to check for heartbeat of the node. 

So if you would combine the output of the UserVmDomRInvestigator together with the KVMHAChecker,
would this be enough to return "host.down" instead of "null" and fix this issue? 

Ideally you would turn of the host trough IPMI to make sure it's dead, but for now could this
be a solution?
                
      was (Author: lennert):
    This is the code that is responsible for nothing to happen (UserVmDomRInvestigator.java)

        if (s_logger.isDebugEnabled()) {
            s_logger.debug("could not reach agent, could not reach agent's host, returning
that we don't have enough information");
        }
        return null;

I think because null is returned nothing happens, I've replaced this simply with "Status.Down"
and the HA works fine.

Maybe I'm looking at this issue to simple, but why would a unreachable agent and an unpingable
host not be enough to trigger HA? The only logical reason i could think of, is that when network
issues occur ugly things could happen. But there still is the KVMHAChecker which uses the
filesystem to check for heartbeat of the node. 

So if you would combine the output of the UserVmDomRInvestigator together with the KVMHAChecker,
would this be enough to return "host.down" instead of "null" and fix this issue? 

Ideally you would turn of the host trough IPMI to make sure it's dead, but for now could this
be a solution?
                  
> No HA actions are performed when a KVM host goes offline
> --------------------------------------------------------
>
>                 Key: CLOUDSTACK-3535
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Hypervisor Controller, KVM, Management Server
>    Affects Versions: 4.1.0, 4.1.1, 4.2.0
>         Environment: KVM (CentOS 6.3) with CloudStack 4.1
>            Reporter: Paul Angus
>            Priority: Blocker
>             Fix For: 4.2.0
>
>         Attachments: management-server.log.Agent
>
>
> If a KVM host 'goes down', CloudStack does not perform HA for instances which are marked
as HA enabled on that host (including system VMs)
> CloudStack does not show the host as disconnected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message