hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4898) BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack
Date Sun, 11 Aug 2013 12:20:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736268#comment-13736268
] 

Hadoop QA commented on HDFS-4898:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12597086/h4898_20130809.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

                  org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4799//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4799//console

This message is automatically generated.
                
> BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local
rack
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDFS-4898
>                 URL: https://issues.apache.org/jira/browse/HDFS-4898
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 1.2.0, 2.0.4-alpha
>            Reporter: Eric Sirianni
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Minor
>         Attachments: h4898_20130809.patch
>
>
> As currently implemented, {{BlockPlacementPolicyWithNodeGroup}} does not properly fallback
to local rack when no nodes are available in remote racks, resulting in an improper {{NotEnoughReplicasException}}.
> {code:title=BlockPlacementPolicyWithNodeGroup.java}
>   @Override
>   protected void chooseRemoteRack(int numOfReplicas,
>       DatanodeDescriptor localMachine, HashMap<Node, Node> excludedNodes,
>       long blocksize, int maxReplicasPerRack, List<DatanodeDescriptor> results,
>       boolean avoidStaleNodes) throws NotEnoughReplicasException {
>     int oldNumOfReplicas = results.size();
>     // randomly choose one node from remote racks
>     try {
>       chooseRandom(
>           numOfReplicas,
>           "~" + NetworkTopology.getFirstHalf(localMachine.getNetworkLocation()),
>           excludedNodes, blocksize, maxReplicasPerRack, results,
>           avoidStaleNodes);
>     } catch (NotEnoughReplicasException e) {
>       chooseRandom(numOfReplicas - (results.size() - oldNumOfReplicas),
>           localMachine.getNetworkLocation(), excludedNodes, blocksize,
>           maxReplicasPerRack, results, avoidStaleNodes);
>     }
>   }
> {code}
> As currently coded the {{chooseRandom()}} call in the {{catch}} block will never succeed
as the set of nodes within the passed in node path (e.g. {{/rack1/nodegroup1}}) is entirely
contained within the set of excluded nodes (both are the set of nodes within the same nodegroup
as the node chosen first replica).
> The bug is that the fallback {{chooseRandom()}} call in the catch block should be passing
in the _complement_ of the node path used in the initial {{chooseRandom()}} call in the try
block (e.g. {{/rack1}})  - namely:
> {code}
> NetworkTopology.getFirstHalf(localMachine.getNetworkLocation())
> {code}
> This will yield the proper fallback behavior of choosing a random node from _within the
same rack_, but still excluding those nodes _in the same nodegroup_

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message