hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-746) CRC computation and reading should move into a nested FileSystem
Date Mon, 27 Nov 2006 20:50:23 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-746?page=comments#action_12453700 ] 
Doug Cutting commented on HADOOP-746:

> The FS needs CRCs to manage replication and validation and should have a uniform internal

I think you mean "HDFS needs...", right?  But HDFS is not the only FS we wish to support,
and not all of these will have a sufficient, end-to-end CRC system.  So a reusable, end-to-end
CRC system is useful to Hadoop.  Whether or not that suffices for HDFS seems to be what you're
answering with a "no", although I'm not sure why.  It seems to me that a well-designed reusable,
end-to-end CRC system could be used by HDFS, so that HDFS doesn't have to re-invent it all.
 The CRC system could, e.g., make CRCs available along with data buffers.  Maybe that's more
pain than it's worth, and it would in fact be simpler to have two CRC systems, one built in
to HDFS and a reusable one that's disabled in HDFS but used by other FSes.  Is that what you're

> Nor is that URL very friendly...

I agree that is a problem with this proposal.  It would be better if users see hdfs:, s3:
and file: urls.

> CRC computation and reading should move into a nested FileSystem
> ----------------------------------------------------------------
>                 Key: HADOOP-746
>                 URL: http://issues.apache.org/jira/browse/HADOOP-746
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.8.0
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
> Currently FileSystem provides both an interface and a mechanism for computing and checking
crc files. I propose splitting the crc code into a nestable FileSystem that like the PhasedFileSystem
has a backing FileSystem. Once the Paths are converted to URI, this is fairly natural to express.
To use crc files, your uris will look like:
> crc://hdfs:%2f%2fhost1:8020/ which is a crc FileSystem with an underlying file system
of hdfs://host1:8020
> This will allow users to use crc files where they make sense for their application/cluster
and get rid of the "raw" methods.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message