hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2342) Consider adding a watchdog node next to region server
Date Wed, 17 Mar 2010 23:50:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846681#action_12846681

Todd Lipcon commented on HBASE-2342:

One question is whether _both_ nodes would be ZK clients, or just the watchdog? If only the
watchdog, we'd have to communicate back and forth between them about any ZK stuff, which would
be a big pain in my opinion.

Another thought worth considering here is whether we could proactively do "rolling restarts"
of region servers to avoid heap fragmentation in the first place. It's a bit of a pain since
you'd end up with a cold cache, but if we could detect when the heap was getting fragmented
and do a very fast RS restart, it's worth thinking about.

> Consider adding a watchdog node next to region server
> -----------------------------------------------------
>                 Key: HBASE-2342
>                 URL: https://issues.apache.org/jira/browse/HBASE-2342
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Todd Lipcon
> This idea has been bandied about a fair amount. The concept is to add a second java process
that runs next to each region server to act as a watchdog. Several possible purposes:
> - monitor the RS for liveness - if it exhibits Juliet syndrome ("appears dead") then
we kill it agressively to prevent it from coming back to life
> - restart RS automatically in failure cases
> - potentially move the entire ZK session to the watchdog to decouple node liveness from
the particular JVM liveness
> Let's discuss in this JIRA.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message