hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: region goes missing on rs (may be during reassignment)
Date Sat, 14 May 2011 04:28:17 GMT
On Fri, May 13, 2011 at 3:01 PM, Raghu Angadi <angadi@gmail.com> wrote:
> Thanks Stack. greatly appreciate the help.
>

No problem.

> hbase.regionserver.handler.count is set to 30.
> we have not set hbase.master.assignment.timeoutmonitor.timeout. will surely
> increase to 180 seconds as HBASE-3846 does.
>

You might want to go to 0.90.3 altogether.  Is that a pain for you?


> The load on the cluster is low to moderate and HBase holds up pretty well.
> Most of the load consists of hourly random writes to the table and
> sequential scans from MR jobs.
>

Thanks boss.


> I will send another email with locations to full master logs.
> There are many "Regions in transition timed out" messages for this region
> and many others spread over time.
>

Grand.

I can come over any time or you should drop by our place.  Its just a
few blocks away and we can munch on lunch while we dig in your logs.

St.Ack

> Raghu.
>
> On Fri, May 13, 2011 at 11:33 AM, Stack <stack@duboce.net> wrote:
>
>> I see that we are timing out region assignment then assigning
>> elsewhere, but the region opened anyway on first server (What do you
>> have hbase.regionserver.handler.count set to?  The default is 10 which
>> could mean a bunch of requests hanging out in the rpc queue before
>> getting into the server to be processed).  One thing you could do is
>> up your region in transition timeout.  Default is 30 seconds which if
>> there is a bunch of churn may not be enough time for region assignment
>> to complete -- was there churn at this time? (We up the default
>> timeout in 0.90.3, see  'HBASE-3846  Set RIT timeout higher').
>>
>> See below for more.
>>
>> On Fri, May 13, 2011 at 8:19 AM, Raghu Angadi <rangadi@apache.org> wrote:
>> ...
>> >> > 2011-05-12 12:05:20,987 DEBUG
>> >> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
>> >> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
>>
>> The region opened successfully.
>>
>> But looking at the master log, 12 seconds earlier it says:
>>
>> >>>> 2011-05-12 12:05:08,122 INFO
>> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
>> timed out:  users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
>> state=3DOPENING, ts=3D1305201871850 2011-05-12 12:05:08,122 INFO
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING
>> for too long, reassigning
>>
>>
>> .... and then forces it reasssigned elsewhere (Your log from master
>> stops at this point.  I'd be interested in seeing more.  Send it to me
>> offline?).
>>
>> Thanks Raghu,
>> St.Ack
>>
>

Mime
View raw message