hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-1017) Region balancing does not bring newly added node within acceptable range
Date Wed, 20 May 2009 19:59:45 GMT

     [ https://issues.apache.org/jira/browse/HBASE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-1017:
-------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Tested latest version of patch.  Had to do it on quiesced cluster because under load region
count is all over place.  Also, killing servers, didn't kill regionserver hosting meta because
that makes a mess of counts too.

But, killing non-catalog hosting regionserver, balance came back promptly.  Adding in a new
node after, balance again came back quickly.   Did this a few times.  Had enough regions that
I should have had Jon's original issue if it had not been fixed.

Thanks for the patch Evgeny.

> Region balancing does not bring newly added node within acceptable range
> ------------------------------------------------------------------------
>
>                 Key: HBASE-1017
>                 URL: https://issues.apache.org/jira/browse/HBASE-1017
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Jonathan Gray
>            Assignee: Evgeny Ryabitskiy
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: HBASE-1017_v1.patch, HBASE-1017_v10.patch, HBASE-1017_v11_FINAL.patch,
HBASE-1017_v12_FINAL.patch, HBASE-1017_v2.patch, HBASE-1017_v4.patch, HBASE-1017_v5.patch,
HBASE-1017_v6.patch, HBASE-1017_v7.patch, HBASE-1017_v8.patch, HBASE-1017_v9.patch, loadbalance2.0.patch
>
>
> With a 10 node cluster, there were only 9 online nodes.  With about 215 total regions,
each of the 9 had around 24 regions (average load is 24).  Slop is 10% so 22 to 26 is the
acceptable range.
> Starting up the 10th node, master log showed:
> {code}
> 2008-11-21 15:57:51,521 INFO org.apache.hadoop.hbase.master.ServerManager: Received start
message from: 72.34.249.210:60020
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.219:60020
is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing
to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to
close region streamitems,^@^@^@^@^AH�;,1225411051632
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to
close region streamitems,^@^@^@^@^@�Ý,1225411056686
> 2008-11-21 15:57:53,351 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to
close region groups,,1222913580957
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 72.34.249.213:60020
is overloaded. Server load: 25 avg: 22.0, slop: 0.1
> 2008-11-21 15:57:53,975 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing
to reassign 3 regions. mostLoadedRegions has 10 regions in it.
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to
close region upgrade,,1226892014784
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to
close region streamitems,^@^@^@^@^@3^Z�,1225411056701
> 2008-11-21 15:57:53,976 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to
close region streamitems,^@^@^@^@^@         ^L,1225411049042
> {code}
> The new regionserver received only 6 regions.  This happened because when the 10th came
in, average load dropped to 22.  This caused two servers with 25 regions (acceptable when
avg was 24 but not now) to reassign 3 of their regions each to bring them back down to the
average.  Unfortunately all other regions remained within the 10% slop (20 to 24) so they
were not overloaded and thus did not reassign off any regions.  It was only chance that made
even 6 of the regions get reassigned as there could have been exactly 24 on each server, in
which case none would have been assigned to the new node.
> This will behave worse on larger clusters when adding a new node has little impact on
the avg load/server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message