hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4393) Implement a canary monitoring program
Date Wed, 14 Sep 2011 22:54:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104971#comment-13104971

Todd Lipcon commented on HBASE-4393:

RPC sampling on the server side won't tell you if, for example, one of the servers in the
cluster has a faulty NIC and thus is dropping packets and has very high latency. The latency
"inside" the server will be fast, but for any clients, it will be slow.

Availability-wise, we sometimes have clusters which only sporadically see access (eg from
an MR job that runs every hour). In that case, it's nice to have a canary monitor to determine
if one of the region servers is having issues _before_ the job runs and times out. We often
find out about these kind of issues from a job failing, instead of proactively from monitoring,
since all of the servers are "up", just one region in some kind of broken state.

> Implement a canary monitoring program
> -------------------------------------
>                 Key: HBASE-4393
>                 URL: https://issues.apache.org/jira/browse/HBASE-4393
>             Project: HBase
>          Issue Type: New Feature
>          Components: monitoring
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
> This JIRA is to implement a standalone program that can be used to do "canary monitoring"
of a running HBase cluster. This program would gather a list of the regions in the cluster,
then iterate over them doing lightweight operations (eg short scans) to provide metrics about
latency as well as alert on availability issues.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message