hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content
Date Thu, 05 Feb 2015 21:15:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308012#comment-14308012

Jing Zhao commented on HDFS-7682:

Thanks Charles. My main concern is that the patch is just a partial fix, since it cannot cover
the case that the file is snapshotted but still being written. 

bq. In other words, the behavior for non-snapshotted files that are still open (and possibly
being appended to) is not changed by this patch, only that of snapshotted files, for which
isLastBlockComplete() is a valid check.

The behavior for snapshotted files that are still open also have not been changed.

Actually for a snapshotted file, {{blockLocations.getFileLength}} should equal to the file
length explicitly recorded in the snapshot diff. If there is not such length recorded, {{blockLocations.getFileLength}}
should be the current file length including the last uc block's length (please read the current
code to confirm). In that case, the check condition should be "if the src is a snapshot path",
and we should use {{blockLocations.getFileLength}} as the limit.

> {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted
> ------------------------------------------------------------------------------------------------
>                 Key: HDFS-7682
>                 URL: https://issues.apache.org/jira/browse/HDFS-7682
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Charles Lamb
>            Assignee: Charles Lamb
>         Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch, HDFS-7682.002.patch
> DistributedFileSystem#getFileChecksum of a snapshotted file includes non-snapshotted
> The reason why this happens is because DistributedFileSystem#getFileChecksum simply calculates
the checksum of all of the CRCs from the blocks in the file. But, in the case of a snapshotted
file, we don't want to include data in the checksum that was appended to the last block in
the file after the snapshot was taken.

This message was sent by Atlassian JIRA

View raw message