hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4162) CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
Date Fri, 12 Sep 2008 02:53:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630469#action_12630469

Chris Douglas commented on HADOOP-4162:

bq. The way we use Hadoop Compression in TFile is to take each compression block as a separate
compression stream (each block writes conclude with compressor.finish()). It makes no assumption
of any internals of compression algorithm. The tests show both LZOP and LZO work fine.
LZOP works because the streams are generated by LzopCodec, which disables all the block checksums
(assuming its target will be HDFS, which keeps its own checksums). In that case, the LzopDecompresor
is a passthrough to LzoDecompressor. If someone were to pick up a LzopDecompressor and use
it on a stream with block checksums, it would fail if that decompressor were reused to open
a TFile. Until LzopDecompressors can be reused without errors (i.e. initHeaderFlags clears
the checksum flags before setting them for the next stream), I'm \-1 on making them reusable
through CodecPool.

bq. it seems that existence of LzopDecompressor is to read lzop compressed data. So I changed
to use LZO instead of LZOP internally for TFile now.
That sounds exactly right. Unless one wants to support a the C tool, LzoCodec should always
be preferred.

> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor.
> -----------------------------------------------------------------------------
>                 Key: HADOOP-4162
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4162
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Hong Tang
>            Assignee: Arun C Murthy
>             Fix For: 0.19.0
>         Attachments: HADOOP-4162_0_20080911.patch
> CodecPool.getDecompressor(LzopCodec) always creates a brand-new decompressor. I investigated
the code, the reason seems to be the following:
> LzopCodec inherits from LzoCodec. The getDecompressorType() method is supposed to return
the concrete Decompressor class type the specific Codec class creates. In this case, LzopCodec
creates LzopDecompressors and should return LzopDecompressor.class. But instead, it uses the
getDecompressorType() method defined in the parent and returns LzoDecompressor.class.
> This leads to CodecPool unable to properly recycle the decompressors.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message