hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: HBase Regionserver Behavior on Failing Hardware
Date Sun, 29 Aug 2010 20:29:19 GMT
Hey Nathan,

I just filed a JIRA to attack this general problem:

I think we'll see issues like this more and more as people start to run
HBase on larger and larger clusters.


On Sun, Aug 29, 2010 at 12:37 PM, Nathan Harkenrider <
nathan.harkenrider@gmail.com> wrote:

> Hello,
> I've run into an interesting HBase failover scenario recently and am
> seeking
> some advice on how to work around the problem.
> First of all, I'm running CDH2 (0.20.1+169.89) and HBase 0.20.3 on a 70
> node
> cluster. One of the nodes in the cluster appears to have a bad disk or disk
> controller. Hadoop identified the failing node and marked it as dead in the
> HDFS admin page as well as the jobtracker. The node has not completely
> failed since I can ping it, but ssh connections are failing. The
> regionserver process on this same node has apparently not completely failed
> either. The HBase master still thinks it is alive, and the node is
> registered in Zookeeper. Clients hitting regions hosted on this particular
> region server are hanging/timing out, which is less than ideal. Any
> thoughts
> on thoughts on how to configure HBase to be more sensitive to this type of
> error? Also, is there any way short of restarting HBase that I can force
> these regions to be reassigned to another regionserver if I don't have
> physical access (or remote console) to stop the regionserver process on the
> failing node.
> The master did not report any errors in its log related to the failing
> node.
> I'm currently waiting on operations to get me the regionserver logs if they
> can be recovered.
> Regards,
> Nathan Harkenrider

Todd Lipcon
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message