hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Ranitovic <irani...@gmail.com>
Subject Re: Region loadbalancing
Date Wed, 15 Dec 2010 01:28:48 GMT
Hi Stack,

We have been running a small cluster (name node + 5 rs) on 0.20.3 for a 
long time now. We are currently at 1100 regions per RS. As far as I can 
tell, I have not seen any problems or changes in behavior due this.

What kind of problems can I expect with 1K+ regions per RS? What is a 
consequence of upping region size from 256M to let's 512M.

Thanks,
i.

On 12/14/2010 09:50 AM, Stack wrote:
> Can you do w/ less regions?  1k plus per server is pushing it I'd say.
>   Can you up your region sizes, for instance?
> St.Ack
>
> On Mon, Dec 13, 2010 at 8:36 AM, Jan Lukavsk√Ĺ
> <jan.lukavsky@firma.seznam.cz>  wrote:
>> Hi all,
>>
>> we are using HBase 0.20.6 on a cluster of about 25 nodes with about 30k
>> regions and are experiencing as issue which causes running  M/R jobs to
>> fail.
>> When we restart single RegionServer, then happens the following:
>>   1) all regions of that RS get reassigned to remaing (say 24) nodes
>>   2) when the restarted RegionServer comes up, HMaster closes about 60
>> regions on all 24 nodes and assigns them back to the restarted node
>>
>> Now, the step 1) is usually very quick (if we can assign 10 regions per
>> heartbeat, we have 240 regions per heartbeat on the whole cluster).
>> The step 2) seems problematic, because first about 1200 regions get
>> unassigned, and then they get slowly assigned to the single RS (speed again
>> 10 regions per heartbeat). This time causes clients of Maps connected to the
>> regions to throw RetriesExhaustedException.
>>
>> I'm aware that we can limit number of regions closed per RegionServer
>> heartbeat by hbase.regions.close.max, but this config option seems a bit
>> unsatisfactory, because as we increase size of the cluster, we will get more
>> and more regions unassigned in single cluster heartbeat (say we limit this
>> to 1, then we get 24 unassigned regions, but only 10 assigned per
>> heartbeat). This led us to a solution, which seems quite simple. We have
>> introduced new config option which is used to limit number of regions in
>> transition. When regionsInTransition.size() crosses boundary, we temporarily
>> stop load balancer. This seems to resolve our issue, because no region gets
>> unassigned for long time and clients manage to recover within their number
>> of retries.
>>
>> My question is, is this s general issue and a new config option should be
>> proposed, or I am missing something a we could have resolved the issue with
>> some other config option tuning?
>>
>> Thanks.
>>   Jan
>>
>>
>


Mime
View raw message