hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9289) check genStamp when complete file
Date Thu, 29 Oct 2015 20:58:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981262#comment-14981262
] 

Zhe Zhang commented on HDFS-9289:
---------------------------------

bq. That's silent data corruption!
[~daryn] I agree it's a silent data corruption in the current logic because we update the
NN's copy of the GS with the reported GS from the client:
{code}
// BlockInfo#commitBlock
this.set(getBlockId(), block.getNumBytes(), block.getGenerationStamp());
{code}

Throwing an exception (and therefore denying the commitBlock) turns this into an explicit
failure, which is better. But it's still a data loss because the data written by the client
after {{updatePipeline}} becomes invisible. 

So I think at least for this particular bug (lacking {{volatile}}), the right thing to do
is to avoid changing NN's copy of GS when committing block (so we should avoid changing blockID
as well). The only thing we should commit is {{numBytes}}. Of course we should still print
a {{WARN}} or {{ERROR}} when GSes mismatch. As a safer first step we should at least avoid
decrementing NN's copy of block GS.

In general, if a client misreports GS, does it indicate a likelihood of misreported {{numBytes}}
-- and therefore we should deny the {{commitBlock}}? It's hard to say; the {{volatile}} bug
here is only for GS. But since we have already ensured the NN's copy of block {{numBytes}}
never decrements, the harm of a misreported {{numBytes}} is not severe.

> check genStamp when complete file
> ---------------------------------
>
>                 Key: HDFS-9289
>                 URL: https://issues.apache.org/jira/browse/HDFS-9289
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Chang Li
>            Assignee: Chang Li
>            Priority: Critical
>         Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch, HDFS-9289.3.patch, HDFS-9289.4.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a pipelineUpdate,
but the file complete with the old block genStamp. This caused the replicas of two datanodes
in updated pipeline to be viewed as corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message