hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3177) Allow DFSClient to find out and use the CRC type being used for a file.
Date Wed, 22 Aug 2012 14:09:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439549#comment-13439549

Kihwal Lee commented on HDFS-3177:

bq. For append, it makes a lot of sense to keep using the existing checksum type. What is
the use case for using a different checksum type?

I don't think it makes sense either, but that was the design decision made in HDFS-2130. There
might have been some use cases for this, so I tried to support it while making the default
to not allow it. If you feel that this should be the behavior with no configurable option,
I will be happy to update the patch accordingly.  

What do you think we should do for concat()? It is supposed to be quick namenode only operation,
so I don't feel comfortable about inserting code to check the checksums of input files.

bq. Suppose the last block is half written with CRC32 in a close file. Then, the file is re-opened
for append with CRC32C. Would the block has two checksum types, i.e. first half is CRC32 and
the second half is CRC32C?

No. Datanode will continue to use the same checksum parameters of the existing partial block
for writing, independent of what client is sending with data. Input data integrity check is
still done, of course. 

bq. Suppose a close file is already using more than one checksum type. Then, the file is re-opened
for append with dfs.client.append.allow-different-checksum == false. Which checksum should
it use? Or should it fail?

I don't think we can do much for existing files. Users can detect it with getFileChecksum(),
which will show DataChecksum.Type.MIXED as its checksum type. For these files, checksum will
still be used for block -level integrity check and nothing will break until something like
distcp tries to compare FileChecksums after copying.  
> Allow DFSClient to find out and use the CRC type being used for a file.
> -----------------------------------------------------------------------
>                 Key: HDFS-3177
>                 URL: https://issues.apache.org/jira/browse/HDFS-3177
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, hdfs client
>    Affects Versions: 0.23.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 2.1.0-alpha, 3.0.0
>         Attachments: hdfs-3177-after-hadoop-8239-8240.patch.txt, hdfs-3177-after-hadoop-8239.patch.txt,
hdfs-3177-branch2-trunk.patch.txt, hdfs-3177.patch, hdfs-3177-with-hadoop-8239-8240.patch.txt,
hdfs-3177-with-hadoop-8239-8240.patch.txt, hdfs-3177-with-hadoop-8239-8240.patch.txt, hdfs-3177-with-hadoop-8239.patch.txt
> To support HADOOP-8060, DFSClient should be able to find out the checksum type being
used for files in hdfs.
> In my prototype, DataTransferProtocol was extended to include the checksum type in the
blockChecksum() response. DFSClient uses it in getFileChecksum() to determin the checksum
type. Also append() can be configured to use the existing checksum type instead of the configured

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message