hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HBASE-611) regionserver should do basic health check before reporting alls-well to the master
Date Wed, 07 May 2008 22:09:56 GMT

     [ https://issues.apache.org/jira/browse/HBASE-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jim Kellerman resolved HBASE-611.

    Resolution: Fixed

Added method isHealthy to HRegionServer. Reviewed by Stack. Committed

> regionserver should do basic health check before reporting alls-well to the master
> ----------------------------------------------------------------------------------
>                 Key: HBASE-611
>                 URL: https://issues.apache.org/jira/browse/HBASE-611
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.1.2
>            Reporter: stack
>            Priority: Minor
>             Fix For: 0.2.0
> On IRC this afternoon, a user killed a regionserver.  It did something in HDFS.   Another
regionserver, one carrying the catalog tables, started to get exceptions out of HDFS.  The
last thing out of it was:
> {code}
> [15:55]	<jgray>	2008-05-01 15:49:51,710 FATAL org.apache.hadoop.hbase.HRegionServer:
Replay of hlog required. Forcing server restart
> [15:55]	<jgray>	org.apache.hadoop.hbase.DroppedSnapshotException: Could not get
block locations. Aborting...
> {code}
> Thats fine.
> Only it didn't go down... it was in a state where it continued to send the master pings
as though nothing was wrong so its lease never timed out and master was hosed because it couldn't
get to catalog tables.
> Regionservers should do a basic check that alls-healthy before they ping the master.
 If critical threads have exited or a flag saying hdfs has been found bad has been set, then
regionserver should stop reporting the master so master can deploy its load elsewhere.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message