hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Region servers exiting, not recovering
Date Wed, 22 Sep 2010 00:24:00 GMT
You could wrap the regionserver in a script that auto-reboots them?

We cant really recover from this scenario, because the master notices
we are dead, then splits our logs and reassigns the regions to other
nodes.  This is the basis of how reliable hbase works in the face of
machine failure.


On Tue, Sep 21, 2010 at 5:20 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> Hi,
> so in our production, we see temporary networking failures (we are not quite
> 100% sure what they are) but now and then region server's zookeeper session
> would get expired and in addition some ipc channels would throw 'channel
> closed'.
> This causes region server to exit. Which is not a very big deal, our
> monitoring system would send a text message so somebody would restart the
> region server.
> however, this does happen a little more often than we probably would have
> liked to do it manually.
> Why is server not recovering/reconnecting automatically? is there a facility
> to enable server restarts and region server nodes to rejoin the cluster
> automatically?
> Thanks.
> -Dmitriy

View raw message