hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sekhon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2115) Transparent compression in HDFS
Date Tue, 10 Mar 2015 18:11:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355349#comment-14355349

Hari Sekhon commented on HDFS-2115:

MapR-FS provides transparent compression at the filesystem level - it's a very good idea.

It could be done on a directory basis (like MapR) with specific subdirectory and file / file
extension exclusions, such as a .ignore_compress file in the directory.

Keeping files in plain text format makes it easier to use different tools on them without
worrying about codec or container format support etc, but currently one can pay an 8x storage
penalty for keeping uncompressed text.

This would solve some real problems for us right now if we had it. It's also annoying that
many tools are always showing reading textfiles but this is so costly on storage without this
transparent compression. We actually are stuck with a large historical archive of compressed
files we can't work with (no zip inputformat) and can't leave them uncompressed either because
of the storage waste which would exceed our cluster capacity. Having to reprocess them all
to convert to different compression and then hope all future tools can handle that format
is far less ideal than just having transparent compression.

The increasing proliferation of tools and products on Hadoop exacerbates this issue as we
can never be sure that the next tool will support format X. Everything supports text. Please
add transparent compression to make working with text better.


Hari Sekhon

> Transparent compression in HDFS
> -------------------------------
>                 Key: HDFS-2115
>                 URL: https://issues.apache.org/jira/browse/HDFS-2115
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, hdfs-client
>            Reporter: Todd Lipcon
> In practice, we find that a lot of users store text data in HDFS without using any compression
codec. Improving usability of compressible formats like Avro/RCFile helps with this, but we
could also help many users by providing an option to transparently compress data as it is

This message was sent by Atlassian JIRA

View raw message