hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-538) Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater as a 'custom codec'
Date Fri, 27 Oct 2006 21:24:18 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-538?page=comments#action_12445272 ] 
            
Arun C Murthy commented on HADOOP-538:
--------------------------------------

Thanks for the detailed feedback to Owen/Sameer, i'll put up an updated patch asap... though
I admit I hadn't thought about 32-bit jvm on a 64-bit OS! :)

Meanwhile, one of the nice side-effects of this patch will be to enable the GzipCodec to work
with SequenceFiles. 

Context: gzip is just zlib algo + extra headers. java.util.zip.GZIP{Input|Output}Stream and
hence existing GzipCodec won't work with SequenceFile due the fact that java.util.zip.GZIP{Input|Output}Streams
will try to read/write gzip headers in the constructors which won't work in SequenceFiles
since we typically read data from disk onto buffers, these buffers are empty on startup/after-reset
and cause the java.util.zip.GZIP{Input|Output}Streams to fail.

The upshot of this patch is that newer (zlib-1.2.*) can deal with this directly (java.util.zip
is zlib-1.1.*), which means we can use them in SequenceFile. However, the downside is that
people will need to have native hadoop code for getting this benefit. If people strongly feel
we need this funcationality without native hadoop code, IMHO not critical since gzip is zlib+headers
i.e. exact compression etc., then I guess we can track it via a separate jira issue... would
people object to me enabling GzipCodec to work with SequenceFile for now only with native
code in? If the native code isn't present I can print out a warning very early and exit...

Thoughts?

> Implement a nio's 'direct buffer' based wrapper over zlib to improve performance of java.util.zip.{De|In}flater
as a 'custom codec'
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-538
>                 URL: http://issues.apache.org/jira/browse/HADOOP-538
>             Project: Hadoop
>          Issue Type: Improvement
>    Affects Versions: 0.6.1
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.8.0
>
>         Attachments: HADOOP-538.patch, HADOOP-538_20061005.tgz, HADOOP-538_20061011.tgz,
HADOOP-538_20061026.tgz, HADOOP-538_benchmarks.tgz
>
>
> There has been more than one instance where java.util.zip's {De|In}flater classes perform
unreliably, a simple wrapper over zlib-1.2.3 (latest stable) using java.nio.ByteBuffer (i.e.
direct buffers) should go a long way in alleviating these woes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message