hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
Date Mon, 25 Sep 2006 03:54:51 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12437447 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

While working on this I've realised that the 'custom compressor' framework we have built as
a part of HADOOP-441 isn't the most flexible one or complete.

Specifically the existing framework only lets us plug-in custom compress/decompress 'streams'
(e.g. a bzip2 input/output stream) while in many cases it is sufficient to use an existing
'stream' and just plug-in a custom deflater/inflater (e.g. native-zlib or lzo inflater/deflater
pair)... java.util.zip's {De|In}flater classes just haven't been designed with this kind of
functionality in mind; making them unsuitable.

Hence I would like to propose that we add a org.apache.hadoop.io.compress.{Com|Decom}pressor
interface which custom {de}compressors can implement and plug into an existing {de}compression
stream... further I would also like to propose that the above {Com|Decom}pressor interfaces
have the same interfaces as the public methods in java.util.zip.{De|In}flater ie.

public interface Compressor {
  public void setInput(byte[] b, int off, int len);
  public boolean needsInput();
  public void setDictionary(byte[] b, int off, int len);
  public void finish();
  public boolean finished();
  public int deflate(ByteBuffer directBuffer, int directBufferLength); // for native calls
with nio's direct buffer
  public int deflate(byte[] b, int off, int len); // for native methods without nio
  public void reset();
  public void end();
}

public interface Decompressor {
  public void setInput(byte[] b, int off, int len);
  public boolean needsInput();
  public void setDictionary(byte[] b, int off, int len);
  public boolean needsDictionary();
  public void finish();
  public boolean finished();
  public int inflate(ByteBuffer directBuffer, int directBufferLength); // for native calls
with nio's direct buffer
  public int inflate(byte[] b, int off, int len); // for native methods without nio
  public void reset();
  public void end();
}

  On the same trajectory we will need to supply a pair of input/output streams which can take
objects implementing the above interfaces to achieve actual compression/decompression. Again
java.util.zip.{De|In}flater{Out|In}put streams won't suffice since they weren't designed with
these in mind. 

 I would like to propose org.apache.hadoop.io.compress.Compression{In|Out}putStreams, but
they are already taken; how about org.apache.hadoop.io.compress.DataCompression{In|Out}putStreams?


 With existing Compression{Out|In}putStreams and the above {Com|Decom}pressor/DataCompression{In|Out}putStreams
we should have a sufficiently complete abstractions to support 'custom codecs'...

Thoughts?

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater
as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.7.0
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform
unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e.
direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message