hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-441) SequenceFile should support 'custom compressors'
Date Thu, 17 Aug 2006 17:40:15 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-441?page=comments#action_12428716 ] 
Doug Cutting commented on HADOOP-441:

The constructors should probably take class instances rather than class names.  Codecs should
be based on DeflaterOutputStream and InflaterInputStream, but it would be best to write just
one name to the file.  So we might add a compressor factory interface like:

public interface CompressionCodec extends Configurable {
  DeflaterOutputStream createDeflaterOutputStream(OutputStream out);
  InflaterInputStream createInflaterInputStream(InputStream in);

Then the constructors would take an instance of this interface and write the name of that
class into the file.  Implementations would be required to provide a public default constructor.

We might also add methods like the following to this interface:

  void writeVersion(DataOutputStream out);
  void readVersion(DataInputStream in) throws VersionMismatchException;

That would permit folks to safely revise a codec without having to use a new class name.

> SequenceFile should support 'custom compressors'
> ------------------------------------------------
>                 Key: HADOOP-441
>                 URL: http://issues.apache.org/jira/browse/HADOOP-441
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.6.0
> SequenceFiles should support 'custom compressors' which can be specified by the user
on creation of the file. 
> Readily available packages for gzip and zip (java.util.zip) are among obvious choices
to support. Of course there will be hooks so that other compressors can be added in future
as long as there is a way to construct (input/output) streams on top of the compressor/decompressor.
> The 'classname' of the 'custom compressor/decompressor' could be stored in the header
of the SequenceFile which can then be used by SequenceFile.Reader to figure out the appropriate
'decompressor'. Thus I propose we add constructors to SequenceFile.Writer which take in the
'classname' of the compressor's input/output stream classes (e.g. DeflaterOutputStream/InflaterInputStream
or GZIPOutputStream/GZIPInputStream), which acts as the hook for future compressors/decompressors.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message