hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-2629) Piggyback basic "alarm" framework on RS heartbeats
Date Wed, 31 Oct 2012 00:32:12 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon resolved HBASE-2629.

    Resolution: Won't Fix

Upon reflection a couple years later, I think we should just make sure to emit useful WARN
logs for cases like this. Existing cluster monitoring systems (eg splunk or CM) are better
suited to surface these logs to users than anything w could build ourselves inside hbase
> Piggyback basic "alarm" framework on RS heartbeats
> --------------------------------------------------
>                 Key: HBASE-2629
>                 URL: https://issues.apache.org/jira/browse/HBASE-2629
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, regionserver
>            Reporter: Todd Lipcon
> There are a number of system conditions that can cause HBase to perform badly or have
stability issues. For example, significant swapping activity or overloaded ZK will result
in all kinds of problems.
> It would be nice to put a very lightweight "alarm" framework in place, so that when the
RS notices something is amiss, it can raise an alarm flag for some period of time. These could
be exposed by JMX to external monitoring tools, and also displayed on the master web UI.
> Some example alarms:
> - "ZK read took >1000ms"
> - "Long garbage collection pause detected"
> - "Writes blocked on region for longer than 5 seconds"
> etc etc

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message