hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-746) CRC computation and reading should move into a nested FileSystem
Date Tue, 05 Dec 2006 19:36:22 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-746?page=comments#action_12455736 ] 
            
Doug Cutting commented on HADOOP-746:
-------------------------------------

We need multiple-CRCs per HDFS block, so that one can seek within a block without having to
checksum the entire block.  Thus a checksum every 8k or less is desired.

But this issue is not about re-architecting CRCs for HDFS, it's rather about cleaning up the
FileSystem API so that the generic CRC implementation, usable by all FileSystems, is less
visible in that API (i.e., the 'raw' methods are ugly).  Owen had an idea of how to do that,
but, unfortunately, it would make filenames ugly, which defeats the purpose.

But perhaps we can still salvage Owen's general approach.  We could implement a ChecksummedFileSystem
that wraps an existing fileystem but does not alter filenames.  HDFS & opthers can privately
the "raw" FileSystem, without checksums.  Then FileSystem.getNamed() can wrap newly created
FileSystem instances in ChecksummedFileSystem to generically add checksumming.  Thus the 'raw'
methods can be removed from the FileSystem API.  Does that work?

> CRC computation and reading should move into a nested FileSystem
> ----------------------------------------------------------------
>
>                 Key: HADOOP-746
>                 URL: http://issues.apache.org/jira/browse/HADOOP-746
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.8.0
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
>
> Currently FileSystem provides both an interface and a mechanism for computing and checking
crc files. I propose splitting the crc code into a nestable FileSystem that like the PhasedFileSystem
has a backing FileSystem. Once the Paths are converted to URI, this is fairly natural to express.
To use crc files, your uris will look like:
> crc://hdfs:%2f%2fhost1:8020/ which is a crc FileSystem with an underlying file system
of hdfs://host1:8020
> This will allow users to use crc files where they make sense for their application/cluster
and get rid of the "raw" methods.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message