hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6837) Support for LZMA compression
Date Thu, 24 Jun 2010 16:36:54 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882214#action_12882214

Scott Carey commented on HADOOP-6837:

Isn't there a new variant of LZMA (file extension xz) that uses LZMA2 and is block based (and
therefore splittable)?  We should definitely make sure that is the variant we want to support.

LZMA is slower than gzip, but compresses better than both bzip2 and gzip.  It is also optimized
for fast decompression -- it decompresses significantly faster than bzip2 (but not as fast
as gzip).

This link is useful for understanding the performance / compression ratio differences across
the various compression levels provided for each:


LZO, FastLZ, LZF, and the like are all faster than the above three but compress at a lower
ratio.  With LZMA support (hopefully .xz files, not the older 7zip) there is little reason
to use bzip2 anymore -- lzma level 2 compresses as fast as bzip2 level 1, but has a compression
ratio as high as bzip2 level 9.  lzma always decompresses 2 to 7 times as fast as bzip2 (only
~ half the decompression speed of gzip). 

It is the ideal archival storage format.   

> Support for LZMA compression
> ----------------------------
>                 Key: HADOOP-6837
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6837
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>            Reporter: Nicholas Carlini
>            Assignee: Nicholas Carlini
>         Attachments: HADOOP-6837-lzma-java-20100623.patch
> Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves
higher compression ratios than both gzip and bzip2.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message