hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2115) Transparent compression in HDFS
Date Wed, 29 Jun 2011 17:09:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057351#comment-13057351

Todd Lipcon commented on HDFS-2115:

I'm thinking something like the following:
- DFSClient can optionally specify a compression codec when writing a file. If specified,
each "packet" in the write pipeline will be compressed with that codec.
- DataNode uses a special header in the block meta file to indicate that the block is compressed
with the given codec.
- To facilitate random access, an index file is kept (either separately or part of the block
meta file) which contains pairs of (uncompressed offset, compressed offset). This allows binary
search to each compression block.
- DFSClient reader is modified to support decompression on the client side.
- Some handshaking will be necessary in case the set of codecs available on the client and
server differ.

Any thoughts on this? Not sure when I'd have time to work on it, but worth starting some brainstorming.

> Transparent compression in HDFS
> -------------------------------
>                 Key: HDFS-2115
>                 URL: https://issues.apache.org/jira/browse/HDFS-2115
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, hdfs client
>            Reporter: Todd Lipcon
> In practice, we find that a lot of users store text data in HDFS without using any compression
codec. Improving usability of compressible formats like Avro/RCFile helps with this, but we
could also help many users by providing an option to transparently compress data as it is

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message