hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HBase Regionserver Behavior on Failing Hardware
Date Mon, 30 Aug 2010 03:36:44 GMT
On Sun, Aug 29, 2010 at 12:37 PM, Nathan Harkenrider
<nathan.harkenrider@gmail.com> wrote:
> The HBase master still thinks it is alive, and the node is
> registered in Zookeeper.

Can you get some logs from this RS?  (You can get to the RS logs from
UI if you do not have access).  It would be interesting to see how the
HW issue manifests.  We should recognize it and abort (We do this
already for various scenarios -- OOME, unreachable HDFS).

Clients hitting regions hosted on this particular
> region server are hanging/timing out, which is less than ideal. Any thoughts
> on thoughts on how to configure HBase to be more sensitive to this type of
> error? Also, is there any way short of restarting HBase that I can force
> these regions to be reassigned to another regionserver if I don't have
> physical access (or remote console) to stop the regionserver process on the
> failing node.

Well, usually you'd just shutdown that RS and its load would be
distributed across the remainders but you need access to kill the
individula RS (In 0.90 you will be able to do it from HBaseAdmin).

> The master did not report any errors in its log related to the failing node.
> I'm currently waiting on operations to get me the regionserver logs if they
> can be recovered.

For sure we'd like to have a looksee.


> Nathan Harkenrider

View raw message