hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-5879) GzipCodec should read compression level etc from configuration
Date Wed, 17 Jun 2009 01:50:07 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chris Douglas updated HADOOP-5879:

    Status: Open  (was: Patch Available)

So one possible way is to let CodecPool do special for Gzip codec, and does either
1) keeps a map for holding gzip codec of different settings.
2) treats the setting as a global setting, and when the setting is changed, clean all gzip
codecs cached in CodecPool.

Does the changes for CodecPool sound reasonable/acceptable?

I'm not sure the "clean" semantics have clear triggers (or they're not clear to me). I'd suggest
an analog to {{end}} in the {{(Dec|C)ompressor}} interface that reinitializes a (de)compressor,
then use those interfaces in the {{CodecPool}}. This would be a better fix for HADOOP-5281,
but it requires updates to other implementors of {{Compressor}}. Something like {{reinit}}
that destroys (with {{end}}) and recreates (with {{init}}) the underlying stream. Overloading
{{CodecPool::getCompressor}} to take a {{Configuration}} and... well, tracing the implications
through the rest of the Codec classes makes it easy to trace where compressors are recycled.
Calling {{reinit}} with parameters matching the current ones should be a noop and calling
{{CodecPool::getCompressor}} without any arguments should use default params.

Since this is a fair amount of work, if you wanted to narrow the issue to be global settings
for GzipCodec, then an approach like that in the current patch is probably sufficient for
many applications.

Quick asides on the current patch: {{ZlibCompressor::construct}} should be final; if overridden
in a subclass, the partially created object would call the subclass instance from the base
cstr. Also, since the parameters are specific to GzipCodc, they should not have generic names
like "io.compress.level".

> GzipCodec should read compression level etc from configuration
> --------------------------------------------------------------
>                 Key: HADOOP-5879
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5879
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Zheng Shao
>         Attachments: hadoop-5879-5-21.patch
> GzipCodec currently uses the default compression level. We should allow overriding the
default value from Configuration.
> {code}
>   static final class GzipZlibCompressor extends ZlibCompressor {
>     public GzipZlibCompressor() {
>       super(ZlibCompressor.CompressionLevel.DEFAULT_COMPRESSION,
>           ZlibCompressor.CompressionStrategy.DEFAULT_STRATEGY,
>           ZlibCompressor.CompressionHeader.GZIP_FORMAT, 64*1024);
>     }
>   }
> {code}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message