hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes
Date Tue, 05 Aug 2014 22:19:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086890#comment-14086890
] 

Hadoop QA commented on HDFS-6791:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12659931/HDFS-6791-2.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

                  org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
                  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
                  org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7563//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7563//console

This message is automatically generated.

> A block could remain under replicated if all of its replicas are on decommissioned nodes
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-6791
>                 URL: https://issues.apache.org/jira/browse/HDFS-6791
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HDFS-6791-2.patch, HDFS-6791-3.patch, HDFS-6791.patch
>
>
> Here is the scenario.
> 1. Normally before NN transitions a DN to decommissioned state, enough replicas have
been copied to other "in service" DNs. However, in some rare situations, the cluster got into
a state where a DN is in decommissioned state and a block's only replica is on that DN. In
such state, the number of replication reported by fsck is 1; the block just stays in under
replicated state; applications can still read the data, given decommissioned node can served
read traffic.
> This can happen in some error situations such DN failure or NN failover. For example
> a) a block's only replica is node A temporarily.
> b) Start decommission process on node A.
> c) When node A is in "decommission-in-progress" state, node A crashed. NN will mark node
A as dead.
> d) After node A rejoins the cluster, NN will mark node A as decommissioned. 
> 2. In theory, NN should take care of under replicated blocks. But it doesn't for this
special case where the only replica is on decommissioned node. That is because NN has the
policy of "decommissioned node can't be picked the source node for replication".
> {noformat}
> BlockManager.java
> chooseSourceDatanode
>       // never use already decommissioned nodes
>       if(node.isDecommissioned())
>         continue;
> {noformat}
> 3. Given NN marks the node as decommissioned, admins will shutdown the datanode. Under
replicated blocks turn into missing blocks.
> 4. The workaround is to recommission the node so that NN can start the replication from
the node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message