hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5438) Flaws in block report processing can cause data loss
Date Tue, 29 Oct 2013 00:08:31 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807502#comment-13807502
] 

Hadoop QA commented on HDFS-5438:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12610696/HDFS-5438.trunk.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

                  org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery
                  org.apache.hadoop.hdfs.server.namenode.TestCorruptFilesJsp

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5299//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5299//console

This message is automatically generated.

> Flaws in block report processing can cause data loss
> ----------------------------------------------------
>
>                 Key: HDFS-5438
>                 URL: https://issues.apache.org/jira/browse/HDFS-5438
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.23.9, 2.2.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-5438-1.trunk.patch, HDFS-5438.trunk.patch
>
>
> The incremental block reports from data nodes and block commits are asynchronous. This
becomes troublesome when the gen stamp for a block is changed during a write pipeline recovery.
> * If an incremental block report is delayed from a node but NN had enough replicas already,
a report with the old gen stamp may be received after block completion. This replica will
be correctly marked corrupt. But if the node had participated in the pipeline recovery, a
new (delayed) report with the correct gen stamp will come soon. However, this report won't
have any effect on the corrupt state of the replica.
> * If block reports are received while the block is still under construction (i.e. client's
call to make block committed has not been received by NN), they are blindly accepted regardless
of the gen stamp. If a failed node reports in with the old gen stamp while pipeline recovery
is on-going, it will be accepted and counted as valid during commit of the block.
> Due to the above two problems, correct replicas can be marked corrupt and corrupt replicas
can be accepted during commit.  So far we have observed two cases in production.
> * The client hangs forever to close a file. All replicas are marked corrupt.
> * After the successful close of a file, read fails. Corrupt replicas are accepted during
commit and valid replicas are marked corrupt afterward.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message