Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B2C8BD1EF for ; Thu, 22 Nov 2012 04:41:01 +0000 (UTC) Received: (qmail 49269 invoked by uid 500); 22 Nov 2012 04:41:01 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 49154 invoked by uid 500); 22 Nov 2012 04:41:00 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 49080 invoked by uid 99); 22 Nov 2012 04:40:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Nov 2012 04:40:58 +0000 Date: Thu, 22 Nov 2012 04:40:58 +0000 (UTC) From: "Tsz Wo (Nicholas), SZE (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <757726668.15781.1353559258986.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (HDFS-3495) Update Balancer to support new NetworkTopology with NodeGroup MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502579#comment-13502579 ] Tsz Wo (Nicholas), SZE commented on HDFS-3495: ---------------------------------------------- Thank Junping for the update. Looked at the patch. You did a great job on fixing/adding comments for the existing code - It seems that we are not going to subclass Balancer. If it is the case, we should not change the fields/methods to protected. - There are a few places initializing a NetworkTopology instance from conf. How about adding a new getInstance(conf) method to NetworkTopology so that Balancer, DatanodeManager and some tests could call it? {code} + cluster = ReflectionUtils.newInstance( + conf.getClass( + CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY, + NetworkTopology.class, NetworkTopology.class), conf) {code} - The one line method areDataNodesOnSameNodeGroup(..) seems unnecessary. How about using cluster.isOnSameNodeGroup(..) directly? - If we have an addTo(..) method in PendingBlockMove, chooseProxySource() can be simplified as below: {code} private boolean addTo(BalancerDatanode bdn) { if (bdn.addPendingBlock(this)) { proxySource = bdn; return true; } return false; } /* Now we find out source, target, and block, we need to find a proxy * * @return true if a proxy is found; otherwise false */ private boolean chooseProxySource() { final DatanodeInfo targetDn = target.getDatanode(); boolean found = false; for (BalancerDatanode loc : block.getLocations()) { // check if there is replica which is on the same rack with the target if (cluster.isOnSameRack(loc.getDatanode(), targetDn) && addTo(loc)) { // if cluster is not nodegroup aware or the proxy is on the same // nodegroup with target, then we already find the nearest proxy if (!cluster.isNodeGroupAware() || cluster.isOnSameNodeGroup(loc.getDatanode(), targetDn)) { return true; } found = true; } if (!found) { // find out a non-busy replica out of rack of target found = addTo(loc); } } return found; } {code} The addTo method could also be used in Source.chooseNextBlockToMove(). {code} PendingBlockMove pendingBlock = new PendingBlockMove(); - if ( target.addPendingBlock(pendingBlock) ) { + if (pendingBlock.addTo(target)) { // target is not busy, so do a tentative block allocation - pendingBlock.source = this; pendingBlock.target = target; {code} - Let's combine chooseNodes() and chooseNodesForCustomFaultDomains()? - chooseTargetOnSameNodeGroup and chooseSourceOnSameNodeGroup look the same. Similarly, chooseTargetsOnSameNodeGroup and chooseSourcesOnSameNodeGroup looks the same. I spent sometime to combine them. See if you think it is good. {code} /* Decide all pairs where source and target are * on the same NodeGroup */ private void chooseNodesOnSameNodeGroup() { /* first step: match each overUtilized datanode (source) to * one or more underUtilized datanodes within same NodeGroup(targets). */ chooseOnSameNodeGroup(overUtilizedDatanodes, underUtilizedDatanodes); /* match each remaining overutilized datanode (source) to below average * utilized datanodes within the same NodeGroup(targets). * Note only overutilized datanodes that haven't had that max bytes to move * satisfied in step 1 are selected */ chooseOnSameNodeGroup(overUtilizedDatanodes, belowAvgUtilizedDatanodes); /* match each remaining underutilized datanode to above average utilized * datanodes within the same NodeGroup. * Note only underutilized datanodes that have not had that max bytes to * move satisfied in step 1 are selected. */ chooseOnSameNodeGroup(underUtilizedDatanodes, aboveAvgUtilizedDatanodes); } private T chooseCandidateOnSameNodeGroup( BalancerDatanode dn, Iterator candidates) { if (dn.isMoveQuotaFull()) { for(; candidates.hasNext(); ) { final T c = candidates.next(); if (!c.isMoveQuotaFull()) { candidates.remove(); continue; } if (cluster.isOnSameNodeGroup(dn.getDatanode(), c.getDatanode())) { return c; } } } return null; } private void selectCandidateOnSameNodeGroup( Source source, BalancerDatanode target) { long size = Math.min(source.availableSizeToMove(), target.availableSizeToMove()); NodeTask nodeTask = new NodeTask(target, size); source.addNodeTask(nodeTask); target.incScheduledSize(nodeTask.getSize()); sources.add(source); targets.add(target); LOG.info("Decided to move "+StringUtils.byteDesc(size)+" bytes from " +source.datanode.getName() + " to " + target.datanode.getName()); } private boolean chooseOnSameNodeGroup( BalancerDatanode dn, Iterator candidates) { final T chosen = chooseCandidateOnSameNodeGroup(dn, candidates); if (chosen == null) { return false; } if (dn instanceof Source) { selectCandidateOnSameNodeGroup((Source)dn, chosen); } else { selectCandidateOnSameNodeGroup((Source)chosen, dn); } if (!chosen.isMoveQuotaFull()) { candidates.remove(); } return true; } private void chooseOnSameNodeGroup(Collection datanodes, Collection candidates) { for (Iterator i = datanodes.iterator(); i.hasNext();) { final D source = i.next(); for(; chooseOnSameNodeGroup(source, candidates.iterator()); ); if (!source.isMoveQuotaFull()) { i.remove(); } } } {code} > Update Balancer to support new NetworkTopology with NodeGroup > ------------------------------------------------------------- > > Key: HDFS-3495 > URL: https://issues.apache.org/jira/browse/HDFS-3495 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer > Affects Versions: 1.1.0, 2.0.2-alpha > Reporter: Junping Du > Assignee: Junping Du > Attachments: HADOOP-8473-Balancer-NodeGroup-aware.patch, HDFS-3495-v2.patch, HDFS-3495-v3.patch > > > Since the Balancer is a Hadoop Tool, it was updated to be directly aware of four-layer hierarchy instead of creating an alternative Balancer implementation. To accommodate extensibility, a new protected method, doChooseNodesForCustomFaultDomain is now called from the existing chooseNodes method so that a subclass of the Balancer could customize the balancer algotirhm for other failure and locality topologies. An alternative option is to encapsulate the algorithm used for the four-layer hierarchy into a collaborating strategy class. > The key changes introduced to support a four-layer hierarchy were to override the algorithm of choosing pairs for balancing. Unit tests were created to test the new algorithm. > The algorithm now makes sure to choose the target and source node on the same node group for balancing as the first priority. Then the overall balancing policy is: first doing balancing between nodes within the same nodegroup then the same rack and off rack at last. Also, we need to check no duplicated replicas live in the same node group after balancing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira