hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Region gets stuck in transition state
Date Wed, 27 Jan 2010 05:17:04 GMT
Restarting the master can help. Some of these bugs were fixed in 0.20.3
which was just released. Upgrade if you can!

On Jan 26, 2010 9:03 PM, "James Baldassari" <james@dataxu.com> wrote:

Hi,

I'm using the Cloudera distribution of HBase, version
0.20.0~1-1.cloudera, in a fully-distributed cluster of 10 nodes.  I'm
using all default config options except for hbase.zookeeper.quorum,
hbase.rootdir, hbase.cluster.distributed, and an updated regionservers
file containing all our region servers.

After running a map/reduce job which inserted around 180,000 rows into
HBase, HBase appeared to be fine.  We could do a count on our table, and
no errors were reported.  We then tried to truncate the table in
preparation for another test but were unable to do so because the region
became stuck in a transition state.  I restarted each region server
individually, but it did not fix the problem.  I tried the
disable_region and close_region commands from the hbase shell, but that
didn't work either.  After doing all of that, a status 'detailed' showed
this:

1 regionsInTransition
   name=retargeting,,1264546222144, unassigned=false, pendingOpen=false,
open=false, closing=true, pendingClose=false, closed=false, offlined=false

Then I restarted the master and all region servers, and it looked like this:

1 regionsInTransition
   name=retargeting,,1264546222144, unassigned=false, pendingOpen=true,
open=false, closing=false, pendingClose=false, closed=false, offlined=false

I noticed messages in some of the region server logs indicating that
their zookeeper sessions had expired.  I'm not sure if this has anything
to do with the problem.  I should mention that this scenario is quite
repeatable, and the last few times it has happened we had to shut down
HBase and manually remove the /hbase root from HDFS, then start HBase
and recreate the table.

Any ideas what could put the region into this state or what do to do fix
it?  How can I prevent this from happening in the future?

I was also wondering whether it was normal for there to be only one
region with 180,000+ rows.  Shouldn't this region be split into several
regions and distributed among the region servers?  I'm new to HBase, so
maybe my understanding of how it's supposed to work is wrong.

Thanks,
James

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message