hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry Chen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9996) Improve TFile format to support any compression codecs
Date Thu, 26 Sep 2013 06:29:04 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jerry Chen updated HADOOP-9996:

    Attachment: HADOOP-9996.patch

Attach patch for reference.
> Improve TFile format to support any compression codecs
> ------------------------------------------------------
>                 Key: HADOOP-9996
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9996
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 3.0.0
>            Reporter: Jerry Chen
>              Labels: Rhino
>         Attachments: HADOOP-9996.patch
>   Original Estimate: 72h
>  Remaining Estimate: 72h
> TFile is a container of key-value pairs. It supports block level compression by using
compression codec. But one limitation of the current implementation is it supports only a
few of fixed compression codecs. They are LZO, GZ or no compression. Some new compression
codecs such as Snappy cannot be used because of this limitation.
> We propose to extend the existing TFile compression feature to support any compression
codecs. As TFile already used the named compression codecs and stored the name in the file
meta data (for example, “lzo” was stored when LZO compression is used), we cannot change
this for backward compatibility. To make it support any compression codec, we add a special
name “codec” after which follows the real codec class name. For example, “codec: org.apache.hadoop.io.compress.SnappyCodec”
is used and stored in the meta when SnappyCodec is used as the compression codec. We can still
use the existing fixed names such as “lzo”, “gz” or “none” for specifying the
TFile compression codec.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message