hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremy Carroll (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4393) Implement a canary monitoring program
Date Mon, 04 Jun 2012 22:33:23 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288967#comment-13288967

Jeremy Carroll commented on HBASE-4393:

Just wanted to put in a few operational comments. We have a version of this Canary script
hooked up to our current HBase cluster for monitoring. It works well to determine if your
cluster is responding to RPC's in a health amount of time. But it does not work well to determine
latency for requests overall as the getStartKey becomes cached. Since a request for the same
key over, and over again is basically cache warming it returns in <1ms every time after
a few iterations.

We played around with the idea of using a random request within the RegionServer to get non-cache
latency responses. In this scenario we basically are testing our disk latency. IMHO the intention
of the Canary is not to test my disk response but the overall response / health of the HBase
RegionServer. We took an approach to use the fsLatency histogram metrics (99, 999th percent)
in a separate check in addition to the Canary for overall health status.
> Implement a canary monitoring program
> -------------------------------------
>                 Key: HBASE-4393
>                 URL: https://issues.apache.org/jira/browse/HBASE-4393
>             Project: HBase
>          Issue Type: New Feature
>          Components: monitoring
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Matteo Bertozzi
>             Fix For: 0.94.0, 0.96.0
>         Attachments: Canary-v0.java, HBASE-4393-v0.patch, HBaseCanary.java
> This JIRA is to implement a standalone program that can be used to do "canary monitoring"
of a running HBase cluster. This program would gather a list of the regions in the cluster,
then iterate over them doing lightweight operations (eg short scans) to provide metrics about
latency as well as alert on availability issues.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message