hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-928) make checksums optional per FileSystem
Date Mon, 26 Feb 2007 23:26:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476040
] 

Doug Cutting commented on HADOOP-928:
-------------------------------------

Sorry, this patch does not apply cleanly to current trunk.  Other patches were recently committed
that conflict with it.

Do FSInputChecker and FSOutputSummer need to be public?  Aren't they only used by ChecksumFileSystem?
 Eventually we might want to expose these publicly, if, e.g., HDFS's eventual built-in checksum
implementation shares code with them, but we also might not, since that mechanism might be
independent.  So, for now, we should probably keep these package-private or even priavate
within ChecksumFileSystem, the only place they're used.

In ChecksumFileSystem#create(Path, int bufferSize) it looks like two buffers of bufferSize
are created.  I think only the inner buffer, created by the underlying raw filesystem, should
be that size, which can be quite large, while the outer buffer should be quite small, no larger
than bytesPerSum.

Similarly, in ChecksumFileSystem#open(Path, int bufferSize), I think the inner buffer should
be large, to minimize seeks, system calls, etc., while the outer buffer should be quite small,
no larger than bytesPerSum.

> make checksums optional per FileSystem
> --------------------------------------
>
>                 Key: HADOOP-928
>                 URL: https://issues.apache.org/jira/browse/HADOOP-928
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Doug Cutting
>         Assigned To: Hairong Kuang
>         Attachments: checksum.patch, checksum1.patch
>
>
> Checksumming is currently built into the base FileSystem class.  It should instead be
optional, with each FileSystem implementation electing whether to use the Hadoop-provided
checksum system, or to disable it, or to implement its own custom checksum system.
> To implement this, a ChecksumFileSystem implementation can be provided that wraps another
FileSystem implementation, implementing checksums as in Hadoop's current mandatory implementation
(i.e., as a separate crc file per file that's elided from directory listings).  The 'raw'
FileSystem methods would be removed.  FSDataInputStream and FSDataOutputStream would be made
interfaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message