hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1017) Region balancing does not bring newly added node within acceptable range
Date Mon, 18 May 2009 21:35:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710503#action_12710503
] 

stack commented on HBASE-1017:
------------------------------

I took a look at this patch:

+ Remove the ^Ms.
+ getLoadToServers in ServerManager doesn't need to be public, right?
+ Test looks good and I like making a class to encapsulate load balancing logic.  I'd suggest
adding javadoc to the load balancer explaining how it works.

I tried the code.  I loaded up a bunch of regions, then shut it down.  Restarted.  All came
up balanced after a little while.  I then tried adding a server to the cluster which seems
to be what Jon was doing above but it never got any regions:

aa0-000-12.u.powerset.com:60031	1242680796620	requests=0, regions=0, usedHeap=27, maxHeap=1244
aa0-000-13.u.powerset.com:60031	1242680136542	requests=0, regions=21, usedHeap=158, maxHeap=1244
aa0-000-14.u.powerset.com:60031	1242680136673	requests=0, regions=20, usedHeap=71, maxHeap=1244
aa0-000-15.u.powerset.com:60031	1242680136162	requests=0, regions=19, usedHeap=106, maxHeap=1244

It stayed at zero.  Wasn't this patch supposed to address that?



> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v11_FINAL.patch,
HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch, HBASE-1017_v6.patch, HBASE-1017_v7.patch,
HBASE-1017_v8.patch, HBASE-1017_v9.patch, loadbalance2.0.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions,
each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the
acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start
message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020
is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing
to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to
close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to
close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to
close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020
is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing
to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to
close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to
close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to
close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came
in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when
avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the
average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they
were not overloaded and thus did not reassign off any regions.  It was only chance that made
even 6 of the regions get reassigned as there could have been exactly 24 on each server, in
which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on
the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message