hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dyer <psyb...@gmail.com>
Subject Uncompressed size of Sequence files
Date Sat, 23 Nov 2013 21:14:03 GMT
Is there an easy way to get the uncompressed size of a sequence file that
is block compressed?  I am using the Snappy compressor.

I realize I can obviously just decompress them to temporary files to get
the size, but I would assume there is an easier way.  Perhaps an existing
tool that my search did not turn up?

If not, I will have to run a MR job load each compressed block and read the
Snappy header to get the size.  I need to do this for a large number of
files so I'd prefer a simple CLI tool (sort of like 'hadoop fs -du').

- Robert

View raw message