hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tatu Saloranta (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6389) Add support for LZF compression
Date Fri, 04 Dec 2009 06:47:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785784#action_12785784
] 

Tatu Saloranta commented on HADOOP-6389:
----------------------------------------

Ok: I am now working with Voldemort team to get goot LZF codec adaptation (need byte[]->byte[],
no need for streams in this case; also prefer using lzf standard framing so that c version
is compatible), and code is available at [http://github.com/ijuma/h2-lzf].

I can now have a look at what interface Hadoop uses for codecs, to see what would be the best
way to get same or modified code hooked up.

Also: one interesting thing about LZF is that its framing is not only very simple, but probably
nice for splitting/merging larger files. There is no separate per-file header; instead, it
is just a sequence of chunks with minimalistic headers. This means that you can just append
chunks by concatenation; or split them in reverse direction, even shuffle if need be. And
skipping through chunks can be done using headers without decompressing actual contents. Sounds
quite nice for hadoop's use case in general... but I don't know how much support is needed
from codec to let framework make good use of this.



> Add support for LZF compression
> -------------------------------
>
>                 Key: HADOOP-6389
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6389
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io
>            Reporter: Tatu Saloranta
>
> (note: related to [HADOOP-4874])
> As per Doug's earlier comments, LZF does indeed look like a good compressor candidate
for fast compression/decompression, good enough compression rate.
> From my testing it seems at least twice as fast at compression, and somewhat faster for
decompressing than gzip.
> Code from [http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/] is
applicable, and I have tested it with json data.
> I hope to have more to spend on this in near future, but if someone else gets to this
first that'd be good too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message