hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brodsky <danbrod...@gmail.com>
Subject Regionservers not connecting to master
Date Wed, 17 Oct 2012 13:01:25 GMT
Good morning,

I have a 10 node Hadoop/Hbase cluster, plus a namenode VM, plus three
Zookeeper quorum peers (one on the namenode, one on a dedicated ZK
peer VM, and one on a third box). All 10 HDFS datanodes are also Hbase
regionservers.

Several weeks ago, we had six HDFS datanodes go offline suddenly (with
no meaningful error messages), and since then, I have been unable to
get all 10 regionservers to connect to the Hbase master. I've tried
bringing the cluster down and rebooting all the boxes, but no joy. The
machines are all running, and hbase-regionserver appears to start
normally on each one.

Right now, my master status page (http://namenode:60010) shows 3
regionservers online. There are also dozens of regions in transition
listed on the status page (in the PENDING_OPEN state), but each of
those are on one of the regionservers already online.

The 7 other regionservers' log files show a successful connection to
one ZK peer, followed by a regular trail of these messages:

2012-10-17 12:36:08,394 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=8.17
MB, free=987.67 MB, max=995.84 MB, blocks=0, accesses=0, hits=0,
hitRatio=0cachingAccesses=0, cachingHits=0,
cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN

If I had to wager a guess, it seems like the 7 offline regionservers
are not connecting to other ZK peers, but there isn't anything in the
ZK logs to indicate why.

Thoughts?

Dan

Mime
View raw message