hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-441) SequenceFile should support 'custom compressors'
Date Thu, 10 Aug 2006 05:27:13 GMT
SequenceFile should support 'custom compressors'

                 Key: HADOOP-441
                 URL: http://issues.apache.org/jira/browse/HADOOP-441
             Project: Hadoop
          Issue Type: New Feature
          Components: io
            Reporter: Arun C Murthy
         Assigned To: Arun C Murthy
             Fix For: 0.6.0

SequenceFiles should support 'custom compressors' which can be specified by the user on creation
of the file. 

Readily available packages for gzip and zip (java.util.zip) are among obvious choices to support.
Also 'bmdiff' seems a good candidate for adding support for. Of course there will be hooks
so that other compressors can be added in future as long as there is a way to construct (input/output)
streams on top of the compressor/decompressor.

The 'classname' of the 'custom compressor/decompressor' could be stored in the header of the
SequenceFile which can then be used by SequenceFile.Reader to figure out the appropriate 'decompressor'.
Thus I propose we add constructors to SequenceFile.Writer which take in the 'classname' of
the compressor's input/output stream classes (e.g. DeflaterOutputStream/InflaterInputStream
or GZIPOutputStream/GZIPInputStream), which acts as the hook for future compressors/decompressors.

Looks like there isn't a java library for bmdiff (I'd love to be corrected on this)... thoughts
on how to go about this? A JNI wrapper on top of a C api? If so how difficult does hadoop-dev
think it is to implement a input/output stream on top of this? Alternatives?

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message