hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8060) Add a capability to use of consistent checksums for append and copy
Date Thu, 22 Mar 2012 19:56:23 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235954#comment-13235954
] 

Kihwal Lee commented on HADOOP-8060:
------------------------------------

Sorry for the delay on this. I will get a set of initial patches up soon. But here is one
design decision I have to make and would appreciate any input on this.

We need a way to specify the checksum type for create(). Currently the checksum type used
for creating DFSOutputStream is set based on dfs.checksum.type when a DFSClient object is
created. If there is no file system cache, users can dictate the checksum type for a new file
by setting the dfs.checksum.type properly. This does not work when the file system cache is
on. The following is why:

- A DFSClient instance can be shared by many threads, so changing the shared class variable
can result in unpredictable behaviors.

- The FileSystem cache is only keyed on the scheme/authority and UGI. The DFSClient object
that was created by a DFS instance in the cache will retain the conf that was used to instantiate
it. If the same UGI is used, this DFSClient will be used for all threads that acesses the
same HDFS cluster. In this case the threads cannot even change the behavior of DFSClient by
changing conf settings, even if we modify DFSClient so that it reads dfs.checksum.type dynamically
during create().

Turning cache off is not an option due to the potential resource exhaustion issues on various
part of systems.

So far, this is the only way I came up with that does not involve FileSystem API change: Add
checksum types to CreateFlag. The types already are defined in DataChecksum, so the changes
are contained in common. I was initially very reluctant about this because I was comparing
the flags to POSIX open flags. But it seems less objectionable once I realized CreateFlag
used for create() is nothing like the POSIX one. :)

If I don't hear any other suggestion, I will prepare a set of patches based on this.  There
will be sub-tasks and a separate blocking jira.
                
> Add a capability to use of consistent checksums for append and copy
> -------------------------------------------------------------------
>
>                 Key: HADOOP-8060
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8060
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, util
>    Affects Versions: 0.23.0, 0.24.0, 0.23.1
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 0.24.0, 0.23.2
>
>
> After the improved CRC32C checksum feature became default, some of use cases involving
data movement are no longer supported.  For example, when running DistCp to copy from a file
stored with the CRC32 checksum to a new cluster with the CRC32C set to default checksum, the
final data integrity check fails because of mismatch in checksums.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message