hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brodsky <danbrod...@gmail.com>
Subject Follow-up to regionservers not being online - more logs included
Date Fri, 19 Oct 2012 13:41:49 GMT
I'm still having several issues with my cluster. This used to all
work, and there have been no recent configuration changes.

To recap, Master and regionservers all appear to start successfully,
but several regionservers do not show as online on Hbase master status
page. Moreover, there appear to be a bunch of regions stuck in
transition that never open. Of the 5 regions currently on the status
page, only two have a numberOfOnlineRegions > 0.

Log file snippets:

First, the ZooKeeper Dump from off the master status web page shows
that some of the regionservers have connected to ZK, but they still
don't show as being online. Note that the IP ending in 217 is the
Hbase master, the ones ending in 31-40 are RS's 1-10 respectively:
http://paste.ee/p/JAUfJ

This is the log file for one of the regionservers that did not come
online, showing not much of anything, I'm afraid:
http://paste.ee/p/KHgOP

In one of the RegionServers that did come online, I'm seeing this
error repeat over and over (several of the RS_ZK_REGION_OPENING debug
statements precede the error): http://paste.ee/p/lbiTN

ZooKeeper log for one of the ZK nodes. Not much remarkable here; the
nodes connect successfully, and there's a repeat opening/closing of a
session with the Hbase master (which is also a ZK quorum peer):
http://paste.ee/p/zjSCO

The master log doesn't show much. A lot this:

CatalogTracker: Failed verification of .META.,,1 at
address=dn-4,60020,1350563250999;
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region is not
online: .META.,,1

But then it does find .META. and open it on a different RS:

2012-10-19 12:59:21,480 INFO
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
OPENED event for .META.,,1.1028785192 from dn-3,60020,1350651496690;
deleting unassigned node
2012-10-19 12:59:21,482 INFO
org.apache.hadoop.hbase.master.AssignmentManager: The master has
opened the region .META.,,1.1028785192 that was online on
dn-3,60020,1350651496690
2012-10-19 12:59:21,497 INFO org.apache.hadoop.hbase.master.HMaster:
.META. assigned=2, rit=false, location=dn-3,60020,1350651496690

The master log file goes on to show that 71 regions come online, which
is consistent with the master status page.

Thoughts?

Mime
View raw message