hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Region gets stuck in transition state
Date Wed, 27 Jan 2010 19:54:30 GMT
On Tue, Jan 26, 2010 at 9:03 PM, James Baldassari <james@dataxu.com> wrote:
>
> After running a map/reduce job which inserted around 180,000 rows into
> HBase, HBase appeared to be fine.  We could do a count on our table, and
> no errors were reported.  We then tried to truncate the table in
> preparation for another test but were unable to do so because the region
> became stuck in a transition state.

Yes.  In older hbase, truncate of > small tables was flakey.  Its
better in 0.20.3 (I wrote our brothers over at Cloudera about updating
version they bundle especially since 0.20.3 just went out).

 I restarted each region server
> individually, but it did not fix the problem.  I tried the
> disable_region and close_region commands from the hbase shell, but that
> didn't work either.  After doing all of that, a status 'detailed' showed
> this:
>
> 1 regionsInTransition
>    name=retargeting,,1264546222144, unassigned=false, pendingOpen=false, open=false,
closing=true, pendingClose=false, closed=false, offlined=false
>
> Then I restarted the master and all region servers, and it looked like this:
>
> 1 regionsInTransition
>    name=retargeting,,1264546222144, unassigned=false, pendingOpen=true, open=false,
closing=false, pendingClose=false, closed=false, offlined=false


Even after a master restart?  Above is dump of a master internal
datastructure that is kept in-memory.  Strange that it would pick up
same exact state on restart (As Ryan says, a restart of the master
alone is usually a radical but sufficient fix).

I was going to say that you try onlining the individual region in the
shell but I don't think that'll work either, not unless you update to
0.20.3 era hbase.

>
> I noticed messages in some of the region server logs indicating that
> their zookeeper sessions had expired.  I'm not sure if this has anything
> to do with the problem.

It could.  The regionservers will restart if their session w/ zk
expires.  Whats your hbase schema like?  How are you doing your
upload?

I should mention that this scenario is quite
> repeatable, and the last few times it has happened we had to shut down
> HBase and manually remove the /hbase root from HDFS, then start HBase
> and recreate the table.
>
For sure you've upped file descriptors and xceiver params as per the
Getting Started?

>
> I was also wondering whether it was normal for there to be only one
> region with 180,000+ rows.  Shouldn't this region be split into several
> regions and distributed among the region servers?  I'm new to HBase, so
> maybe my understanding of how it's supposed to work is wrong.

Get the regions size on the filesystem: ./bin/hadoop fs -dus
/hbase/table/regionname.  Region splits when its above a size
threshold, 256M usually.

St.Ack

>
> Thanks,
> James
>
>
>

Mime
View raw message