hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Region gets stuck in transition state
Date Wed, 27 Jan 2010 22:51:48 GMT
On Wed, Jan 27, 2010 at 2:41 PM, James Baldassari <james@dataxu.com> wrote:
>
> First we shut down the master and all region servers and then manually
> removed the /hbase root through hadoop/HDFS.  One of my colleagues
> increased some timeout values (I think they were ZooKeeper timeouts).

ticktime?

> Another change was that I recreated the table without LZO compression
> and without setting the IN_MEMORY flag.  I learned that we did not have
> the LZO libraries installed, and the table had been created originally
> with compression set to LZO, so I imagine that would cause problems.  I
> didn't see any errors about it in the logs, however.  Maybe this
> explains why we lost data during our initial testing after shutting down
> HBase.  Perhaps it was unable to write the data to HDFS because the LZO
> libraries were not available?
>

If lzo enabled and libs are not in place, no data is written IIRC.
Its a problem.

> Anyway, everything seems to be ok for now.  We can restart HBase without
> data loss or errors, and we can truncate the table without any problems.
> If any other issues crop up we plan on upgrading to 0.20.3, but our
> preference is to stay with the Cloudera distro if we can.  We're doing
> additional testing tonight with a larger dataset, so I'll keep an eye on
> it and post back if we learn anything new.

Avoid truncating tables if you are not on 0.20.3.  Its flakey and may
put you back in the spot you complained of orignally.

St.Ack

>
> Thanks again for your help.
>
> -James
>
>
> On Wed, 2010-01-27 at 13:54 -0600, Stack wrote:
>> On Tue, Jan 26, 2010 at 9:03 PM, James Baldassari <james@dataxu.com> wrote:
>> >
>> > After running a map/reduce job which inserted around 180,000 rows into
>> > HBase, HBase appeared to be fine.  We could do a count on our table, and
>> > no errors were reported.  We then tried to truncate the table in
>> > preparation for another test but were unable to do so because the region
>> > became stuck in a transition state.
>>
>> Yes.  In older hbase, truncate of > small tables was flakey.  Its
>> better in 0.20.3 (I wrote our brothers over at Cloudera about updating
>> version they bundle especially since 0.20.3 just went out).
>>
>>  I restarted each region server
>> > individually, but it did not fix the problem.  I tried the
>> > disable_region and close_region commands from the hbase shell, but that
>> > didn't work either.  After doing all of that, a status 'detailed' showed
>> > this:
>> >
>> > 1 regionsInTransition
>> >    name=retargeting,,1264546222144, unassigned=false, pendingOpen=false, open=false,
closing=true, pendingClose=false, closed=false, offlined=false
>> >
>> > Then I restarted the master and all region servers, and it looked like this:
>> >
>> > 1 regionsInTransition
>> >    name=retargeting,,1264546222144, unassigned=false, pendingOpen=true, open=false,
closing=false, pendingClose=false, closed=false, offlined=false
>>
>>
>> Even after a master restart?  Above is dump of a master internal
>> datastructure that is kept in-memory.  Strange that it would pick up
>> same exact state on restart (As Ryan says, a restart of the master
>> alone is usually a radical but sufficient fix).
>>
>> I was going to say that you try onlining the individual region in the
>> shell but I don't think that'll work either, not unless you update to
>> 0.20.3 era hbase.
>>
>> >
>> > I noticed messages in some of the region server logs indicating that
>> > their zookeeper sessions had expired.  I'm not sure if this has anything
>> > to do with the problem.
>>
>> It could.  The regionservers will restart if their session w/ zk
>> expires.  Whats your hbase schema like?  How are you doing your
>> upload?
>>
>> I should mention that this scenario is quite
>> > repeatable, and the last few times it has happened we had to shut down
>> > HBase and manually remove the /hbase root from HDFS, then start HBase
>> > and recreate the table.
>> >
>> For sure you've upped file descriptors and xceiver params as per the
>> Getting Started?
>>
>> >
>> > I was also wondering whether it was normal for there to be only one
>> > region with 180,000+ rows.  Shouldn't this region be split into several
>> > regions and distributed among the region servers?  I'm new to HBase, so
>> > maybe my understanding of how it's supposed to work is wrong.
>>
>> Get the regions size on the filesystem: ./bin/hadoop fs -dus
>> /hbase/table/regionname.  Region splits when its above a size
>> threshold, 256M usually.
>>
>> St.Ack
>>
>> >
>> > Thanks,
>> > James
>> >
>> >
>> >
>
>

Mime
View raw message