hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Region loadbalancing
Date Tue, 14 Dec 2010 17:50:58 GMT
Can you do w/ less regions?  1k plus per server is pushing it I'd say.
 Can you up your region sizes, for instance?

On Mon, Dec 13, 2010 at 8:36 AM, Jan Lukavský
<jan.lukavsky@firma.seznam.cz> wrote:
> Hi all,
> we are using HBase 0.20.6 on a cluster of about 25 nodes with about 30k
> regions and are experiencing as issue which causes running  M/R jobs to
> fail.
> When we restart single RegionServer, then happens the following:
>  1) all regions of that RS get reassigned to remaing (say 24) nodes
>  2) when the restarted RegionServer comes up, HMaster closes about 60
> regions on all 24 nodes and assigns them back to the restarted node
> Now, the step 1) is usually very quick (if we can assign 10 regions per
> heartbeat, we have 240 regions per heartbeat on the whole cluster).
> The step 2) seems problematic, because first about 1200 regions get
> unassigned, and then they get slowly assigned to the single RS (speed again
> 10 regions per heartbeat). This time causes clients of Maps connected to the
> regions to throw RetriesExhaustedException.
> I'm aware that we can limit number of regions closed per RegionServer
> heartbeat by hbase.regions.close.max, but this config option seems a bit
> unsatisfactory, because as we increase size of the cluster, we will get more
> and more regions unassigned in single cluster heartbeat (say we limit this
> to 1, then we get 24 unassigned regions, but only 10 assigned per
> heartbeat). This led us to a solution, which seems quite simple. We have
> introduced new config option which is used to limit number of regions in
> transition. When regionsInTransition.size() crosses boundary, we temporarily
> stop load balancer. This seems to resolve our issue, because no region gets
> unassigned for long time and clients manage to recover within their number
> of retries.
> My question is, is this s general issue and a new config option should be
> proposed, or I am missing something a we could have resolved the issue with
> some other config option tuning?
> Thanks.
>  Jan

View raw message