Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-dev@hadoop.apache.org
Message-ID: <1399169525.1231873083192.JavaMail.jira@brutus>
Date: Tue, 13 Jan 2009 10:58:03 -0800 (PST)
From: "stack (JIRA)" <jira@apache.org>
To: hbase-dev@hadoop.apache.org
Subject: [jira] Commented: (HBASE-1124) Balancer kicks in way too early
In-Reply-To: <1726092921.1231832339934.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663421#action_12663421 ] 

stack commented on HBASE-1124:
------------------------------

Looking at Andrew's logs, you're both 'right'.

Yes, balancer doesn't cut in till regions are all assigned only, when big cluster there is a big gap between all assigned and all open.  In this gap, I see in Andrew's log the balancer cutting in.  We don't want it working here while all regionservers have a big queue of region opens that they are currently working on.

Here is an example.

All regions have been handed out and master is just waiting on the opens to come in.

{code}
....
009-01-13 06:57:09,006 INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: result_domain,com.chawlk,1231796870012 from XX.XX.XX.37:60020
2009-01-13 06:57:09,006 INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: content,28e2ec17934b05f11a77a88b1528d905,1231822159077 from XX.XX.XX.37:60020
2009-01-13 06:57:09,006 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server 10.30.94.37:60020 is overloaded. Server load: 26 avg: 21.0, slop: 0.2
2009-01-13 06:57:09,006 DEBUG org.apache.hadoop.hbase.master.RegionManager: Choosing to reassign 5 regions. mostLoadedRegions has 10 regions in it.
2009-01-13 06:57:09,006 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region content,afebbf5e615585830ebe6f74e1014f3d,1231766212960
2009-01-13 06:57:09,006 INFO org.apache.hadoop.hbase.master.RegionManager: Skipped 9 region(s) that are in transition states
...
{code}

Above we are closing 'content,afebbf5e615585830ebe6f74e1014f3d,1231766212960' which had just opened 3 seconds earlier.  About 5% of all regions assigned have reported back as opened.  We shouldn't be balancing at this time.

> Balancer kicks in way too early
> -------------------------------
>
>                 Key: HBASE-1124
>                 URL: https://issues.apache.org/jira/browse/HBASE-1124
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>             Fix For: 0.19.0
>
>
> Balancer kicks in before all regions are assigned out. Causes confusion. Master won't accept OPENs from "overloaded" HRS. Master is slow to respond to UI and HRS during. Master sometimes takes too long to respond to a HRS heartbeat and so the HRS will reinit. This causes more confusion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.