hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8240) Allow users to specify a checksum type on create()
Date Tue, 15 May 2012 18:19:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276083#comment-13276083

Kihwal Lee commented on HADOOP-8240:

We need this feature to make data copying and verification work across clusters with different
configurations. I would appreciate any feedback.

h4. Design Choices

# *Add a new create method to FileSystem for allowing checksum type to be specified.* FileSystem#create()
already allows specifying bytesPerChecksum.   The new create method may accept a DataChecksum
object.  Users can use the existing DataChecksum.newDataChecksum( int type, int bytesPerChecksum)
to create one. Users who wants to specify non-default type likely want to control bytesPerChecksum
as well. 
# *Add checksum types to CreateFlags.* This approach minimizes interface changes, but may
not be the most intuitive/consistent way.
# *Add a method to FSDataOutputStream and DFSOutputStream to allow users to override default
checksum parameters.*  This method should fail if data is already written.  This is sort of
like ioctl. If there are other tunables we want to support, we could generalize the api. But
changing internal parameters (not encapsulated data) of an object during run-time doesn't
go well with typical java semantics and may cause confusion. So we need to be careful about

h4. Other previously discussed approaches

# *Setting dfs.checksum.type.*  FileSystem cache cause it to be stay the same after the creation
of DFSClient.  Also, conf is shared, so it can have unforeseen side-effects.
# *Disable FileSystem cache.* Create a new Configuration and set dfs.checksum.type. Without
cache, memory bloat is too much. 
# *Use conf as a part of key in FileSystem cache, in addition to UGI and scheme + authority.*
Something along this line may work.  Doing shallow comparison may not be enough. Do we create
a special hashCode/equals to make it safer?  There will be memory bloat, but how much?  It
is still up to users to manage different configurations and may be more prone to mistakes
because of that.

> Allow users to specify a checksum type on create()
> --------------------------------------------------
>                 Key: HADOOP-8240
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8240
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.23.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 0.23.3, 2.0.0, 3.0.0
>         Attachments: hadoop-8240.patch
> Per discussion in HADOOP-8060, a way for users to specify a checksum type on create()
is needed. The way FileSystem cache works makes it impossible to use dfs.checksum.type to
achieve this. Also checksum-related API is at Filesystem-level, so we prefer something at
that level, not hdfs-specific one.  Current proposal is to use CreatFlag.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message