Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C92CB6CCD for ; Fri, 17 Jun 2011 01:05:08 +0000 (UTC) Received: (qmail 69781 invoked by uid 500); 17 Jun 2011 01:05:08 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 69749 invoked by uid 500); 17 Jun 2011 01:05:08 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 69741 invoked by uid 99); 17 Jun 2011 01:05:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2011 01:05:08 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2011 01:05:07 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 64B3F41C08B for ; Fri, 17 Jun 2011 01:04:47 +0000 (UTC) Date: Fri, 17 Jun 2011 01:04:47 +0000 (UTC) From: "Jieshan Bean (JIRA)" To: issues@hbase.apache.org Message-ID: <210091632.13551.1308272687409.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <312950289.1167.1308012647365.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-3985) Same Region could be picked out twice in LoadBalancer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050834#comment-13050834 ] Jieshan Bean commented on HBASE-3985: ------------------------------------- Thanks stack! Yes, this issue doesn't exist in trunk. Next time, if the issue related to both 0.90 and trunk, I will remember to make a patch for trunk also. > Same Region could be picked out twice in LoadBalancer > ----------------------------------------------------- > > Key: HBASE-3985 > URL: https://issues.apache.org/jira/browse/HBASE-3985 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.3 > Reporter: Jieshan Bean > Assignee: Jieshan Bean > Fix For: 0.90.4 > > Attachments: AllProjectTestResults.txt, HBASE-2985-LoadBalancer-V2.patch, HBASE-3985-LoadBalancer.patch, HBASE-3985-LoadBalancer_(NoneServerLog).patch, org.apache.hadoop.hbase.master.TestLoadBalancer.txt > > > From the HMaster logs, I found something weird: > 2011-05-24 11:12:11,152 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=hello,122130,1305944329350.7d6c96428e2563c3d8676474d0a9f814., src=158-1-101-202,20020,1306205409671, dest=158-1-101-222,20020,1306205940117 > 2011-05-24 11:12:31,536 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=hello,122130,1305944329350.7d6c96428e2563c3d8676474d0a9f814., src=158-1-101-202,20020,1306205409671, dest=158-1-101-222,20020,1306205940117 > We can see that, the same region was balanced twice. > To describe the problem, I give out one simple example: > 1. Suppose regions count is 10 in RegionServer A. > Max: 5 Min:4 > 2. So the regions count need to move is: 5. > 3. Before the movement of calculate, the list was shuffled. > 4. The 5 moving region was picked out from the back. > 5. The nextRegionForUnload value is 5. > 6. So if the neededRegions is not zero. Maybe there's still one region should be picked out from RegionServer A. > This time , the picked Index is 5 which has been picked once!!!!! > > |<-----5-------| > ------------*--*--*--*--*--*--*--*--*--*---- > | > getNextRegionForUnload > Here's the analysis from code: > 1. Walk down most loaded, pruning each to the max. Picked region from back of the list(by reverse order) > Map serverBalanceInfo = > new TreeMap(); > for(Map.Entry> server : > serversByLoad.descendingMap().entrySet()) { > HServerInfo serverInfo = server.getKey(); > int regionCount = serverInfo.getLoad().getNumberOfRegions(); > if(regionCount <= max) { > serverBalanceInfo.put(serverInfo, new BalanceInfo(0, 0)); > break; > } > serversOverloaded++; > List regions = randomize(server.getValue()); > int numToOffload = Math.min(regionCount - max, regions.size()); > int numTaken = 0; > for (int i = regions.size() - 1; i >= 0; i--) { > HRegionInfo hri = regions.get(i); > // Don't rebalance meta regions. > if (hri.isMetaRegion()) continue; > regionsToMove.add(new RegionPlan(hri, serverInfo, null)); > numTaken++; > if (numTaken >= numToOffload) break; > } > /**********************************************************/ > /***set the nextRegionForUnload value is numToOffload ****/ > /**********************************************************/ > serverBalanceInfo.put(serverInfo, > new BalanceInfo(numToOffload, (-1)*numTaken)); > } > 2. The second pass of picked one region from the Max regionserver by order. > if (neededRegions != 0) { > // Walk down most loaded, grabbing one from each until we get enough > for(Map.Entry> server : > serversByLoad.descendingMap().entrySet()) { > BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey()); > int idx = > balanceInfo == null ? 0 : balanceInfo.getNextRegionForUnload(); > if (idx >= server.getValue().size()) break; > HRegionInfo region = server.getValue().get(idx); > if (region.isMetaRegion()) continue; // Don't move meta regions. > regionsToMove.add(new RegionPlan(region, server.getKey(), null)); > if(--neededRegions == 0) { > // No more regions needed, done shedding > break; > } > } > } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira