hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Tuttle <...@mentacapital.com>
Subject unstable cluster
Date Tue, 12 Apr 2016 00:08:53 GMT
Hello -

We've started experiencing regular failures of our HBase cluster.  For the last week we've
had nightly failures about 1hr after a heavy batch process starts.

In the logs below we see the failure starting at 2016-04-11 03:11 in zookeeper, master and
region server logs:

zookeeper:  http://pastebin.com/kf7ja22K

region server: http://pastebin.com/tduJgKqq

master:  http://pastebin.com/0szhi0bJ

The master log seems most interesting.  Here we see problems connecting to Zookeeper then
a number of region servers dying in quick succession.  From the log evidence it appears Zookeeper
is not responding rather than the more typical GC causing isolated RS to abort.

Any insights on what may be happening here?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message