hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3495) Update Balancer to support new NetworkTopology with NodeGroup
Date Thu, 22 Nov 2012 09:55:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502685#comment-13502685
] 

Junping Du commented on HDFS-3495:
----------------------------------

Hi Nicholas,
   Thanks for your review and your great comments! Please see my reply below:
1.It seems that we are not going to subclass Balancer. If it is the case, we should not change
the fields/methods to protected.
[JP] Sure. I remove all protected method in Balancer. The exception could be on unit test
as it needs a subclass of MiniDFSCluster (MiniDFSClusterWithNodeGroup) which have different
configuration in initialization.
2. There are a few places initializing a NetworkTopology instance from conf. How about adding
a new getInstance(conf) method to NetworkTopology so that Balancer, DatanodeManager and some
tests could call it? 
[JP] Sure. Updated.
3. The one line method areDataNodesOnSameNodeGroup(..) seems unnecessary. How about using
cluster.isOnSameNodeGroup(..) directly?
[JP] Agree. it could be better. Updated.
4.If we have an addTo(..) method in PendingBlockMove, chooseProxySource() can be simplified
as below: 
[JP] It is awesome! It makes code much simplified and clear. Thanks!
5. The addTo method could also be used in Source.chooseNextBlockToMove().
[JP] It looks slight different case there. If we use addTo() to replace original code, then
we will lose "pendingBlock.source = this;" and add " proxySource = target" which is different
with original logic?
6.Let's combine chooseNodes() and chooseNodesForCustomFaultDomains()?
[JP] Sure. Two methods are combined.
7. chooseTargetOnSameNodeGroup and chooseSourceOnSameNodeGroup look the same. Similarly, chooseTargetsOnSameNodeGroup
and chooseSourcesOnSameNodeGroup looks the same. I spent sometime to combine them. See if
you think it is good. 
[JP] It is super! I put it in new changes and the method of selectCandidateOnSameNodeGroup
can be used by other code in Balancer (I rename to matchSourceWithTargetToMove as it is more
general).
I will update patch soon.
                
> Update Balancer to support new NetworkTopology with NodeGroup
> -------------------------------------------------------------
>
>                 Key: HDFS-3495
>                 URL: https://issues.apache.org/jira/browse/HDFS-3495
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer
>    Affects Versions: 1.1.0, 2.0.2-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8473-Balancer-NodeGroup-aware.patch, HDFS-3495-v2.patch,
HDFS-3495-v3.patch
>
>
> Since the Balancer is a Hadoop Tool, it was updated to be directly aware of four-layer
hierarchy instead of creating an alternative Balancer implementation. To accommodate extensibility,
a new protected method, doChooseNodesForCustomFaultDomain is now called from the existing
chooseNodes method so that a subclass of the Balancer could customize the balancer algotirhm
for other failure and locality topologies. An alternative option is to encapsulate the algorithm
used for the four-layer hierarchy into a collaborating strategy class.
> The key changes introduced to support a four-layer hierarchy were to override the algorithm
of choosing <source, target> pairs for balancing. Unit tests were created to test the
new algorithm.
> The algorithm now makes sure to choose the target and source node on the same node group
for balancing as the first priority. Then the overall balancing policy is: first doing balancing
between nodes within the same nodegroup then the same rack and off rack at last. Also, we
need to check no duplicated replicas live in the same node group after balancing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message