cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roeland Kuipers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CLOUDSTACK-4607) Reboot router on out-of-memory vs OOM killer
Date Wed, 04 Sep 2013 17:58:56 GMT
Roeland Kuipers created CLOUDSTACK-4607:
-------------------------------------------

             Summary: Reboot router on out-of-memory vs OOM killer
                 Key: CLOUDSTACK-4607
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4607
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Virtual Router
    Affects Versions: 4.1.1
            Reporter: Roeland Kuipers


We have experienced a serious outage on redundant routing vm pair due to the OOM killer. Somehow
the master node ran OoM and the OOM killer decided to kill random processes causing HAproxy
to go down. But since keepalived was still running and functioning, a failover never happened.

In our experience we rather panic on OOM instead of praying that the OOM-killer will do the
right thing while it in 99% percent of the cases it just renders a machine useless. 
If this RvR would have panicked and rebooted we would have had a nice keepalived failure/failover
without much impact on our customer.

To counter this scenario we rather see Panic and Reboot on an Out-Of-Memory condition instead
of relying on the OOM killer which is a big gamble.

See also CLOUDSTACK-4605 and CLOUDSTACK-4606

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message