hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ferdy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-2117) Simple check on the master overview page if the number of currently running regionservers is unchanged.
Date Tue, 12 Jan 2010 20:18:54 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ferdy updated HBASE-2117:

    Attachment: HBASE-2117-v2.patch

Thank you for the suggestions.

The patch now includes the ASF license and javadoc. The name of the method is now more specific
and the diagnostic message is more subtle. Finally, the number of configured regionservers
is reloaded every time the page is requested, which shouldn't be too expensive.

> Simple check on the master overview page if the number of currently running regionservers
is unchanged.
> -------------------------------------------------------------------------------------------------------
>                 Key: HBASE-2117
>                 URL: https://issues.apache.org/jira/browse/HBASE-2117
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: master, regionserver
>    Affects Versions: 0.20.2
>            Reporter: Ferdy
>         Attachments: HBASE-2117-v2.patch, HBASE-2117.patch
> Incidentally, it happens that some of our regionservers just stop working. The regionserver
logs show some sort of termination and the affected regionserver is just removed from the
master page. Besides the actual problem of the termination, what I was missing was some sort
of warning (from either running client code or the master page) that some regionservers are
having trouble.
> It seems like the Master is ok with the fact that a regionserver suddenly decides to
stop. The result is that the clients depending on the data in Hbase will be presented an incomplete
data set, at least as long as the failing regions are not re-assigned yet. In order to have
this monitored, I decided to create a patch that exposes an extra piece of information on
the master page. An 'OK:' is presented if the current number of regionservers is unchanged
since the start of the processes. An 'ERROR:' is shown whenever the current number is not
the same. What the master page does is reading the 'regionservers' file once, and remember
the number of slaves so that is can be used in the check. (So afterwards changes to this file
are not supported).
> Perhaps this is not the right way of doing things. Please let me know if there are any
existing solutions for these issues.
> I will attach a patch right away.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message