hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert James <srobertja...@gmail.com>
Subject HDFS Block compression
Date Mon, 04 Jul 2016 14:16:02 GMT
A lot of work in Hadoop concerns splittable compression.  Could this
be solved by offerring compression at the HDFS block (ie 64 MB) level,
just like many OS filesystems do?

http://stackoverflow.com/questions/6511255/why-cant-hadoop-split-up-a-large-text-file-and-then-compress-the-splits-using-g?rq=1
discusses this and suggests the issues is separation of concerns.
However, if the compression is done at the *HDFS block* level (with
perhaps a single flag indicating such), this would be totally
transparent to readers and writers.  This is the exact way, for
example, NTFS compression works; apps need no knowledge of the
compression.  HDFS, since it doesn't allow random reads and writes,
but only streaming, is a perfect candidate for this.

Thoughts?

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Mime
View raw message