hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charles Lamb (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content
Date Tue, 27 Jan 2015 17:21:34 GMT

     [ https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Charles Lamb updated HDFS-7682:
    Attachment: HDFS-7682.001.patch

Hi [~jingzhao],

Thanks for looking at this.

isLastBlockComplete() covers the case where it's a snapshot path as well as a closed non-snapshot
path. The file length is correct in both those cases so it's ok to use that. In the case of
a still-being-written file, then isLastBlockComplete() returns false and the code works just
same as it does today. The particular case that this patch is fixing is that a snapshotted
file is frozen, so the file length is the limit of what should be checksummed, not the block
lengths (which include the non-snapshotted portion). I've added more assertions in the test
to demonstrate this.

In other words, the behavior for non-snapshotted files that are still open (and possibly being
appended to) is not changed by this patch, only that of snapshotted files, for which isLastBlockComplete()
is a valid check.

HDFS-5343 took a similar approach.

> {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted
> ------------------------------------------------------------------------------------------------
>                 Key: HDFS-7682
>                 URL: https://issues.apache.org/jira/browse/HDFS-7682
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Charles Lamb
>            Assignee: Charles Lamb
>         Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch
> DistributedFileSystem#getFileChecksum of a snapshotted file includes non-snapshotted
> The reason why this happens is because DistributedFileSystem#getFileChecksum simply calculates
the checksum of all of the CRCs from the blocks in the file. But, in the case of a snapshotted
file, we don't want to include data in the checksum that was appended to the last block in
the file after the snapshot was taken.

This message was sent by Atlassian JIRA

View raw message