hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-746) CRC computation and reading should move into a nested FileSystem
Date Sun, 26 Nov 2006 00:09:03 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-746?page=comments#action_12452615 ] 
eric baldeschwieler commented on HADOOP-746:


This seems like it would have advantages in how we manage temporary storage and such.  BUT...
I think HDFS needs to support CRCs for all files "below the covers".  I don't think we should
rip that out and leave it to use code to invoke CRCs.  

The FS needs CRCs to manage replication and validation and should have a uniform internal
mechanism.  I don't know that it is necessary that these CRCs be user accessible, but I do
think that it is necessary that all blocks CRC in the same simple way. 

Nor is that URL very friendly...  Also nominally the URI prefix is supposted to specify the
protocol / transport, right?  CRC seems like it belongs below the transport, not wrapping
it.  odd.

> CRC computation and reading should move into a nested FileSystem
> ----------------------------------------------------------------
>                 Key: HADOOP-746
>                 URL: http://issues.apache.org/jira/browse/HADOOP-746
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.8.0
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
> Currently FileSystem provides both an interface and a mechanism for computing and checking
crc files. I propose splitting the crc code into a nestable FileSystem that like the PhasedFileSystem
has a backing FileSystem. Once the Paths are converted to URI, this is fairly natural to express.
To use crc files, your uris will look like:
> crc://hdfs:%2f%2fhost1:8020/ which is a crc FileSystem with an underlying file system
of hdfs://host1:8020
> This will allow users to use crc files where they make sense for their application/cluster
and get rid of the "raw" methods.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message