hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zac Shepherd <zsheph...@about.com>
Subject Re: bz2 decompress in place
Date Thu, 22 Aug 2013 12:37:52 GMT
Just because I always appreciate it when someone posts the answer to 
their own question:

We have some java that does
	BZip2Codec bz2 = new BZip2Codec();
	CompressionOutputStream cout = bz2.createOutputStream(out);
for compression.

We just wrote another version that does
	BZip2Codec bz2 = new BZip2Codec();
	CompressionInputStream cin = bz2.createInputStream(in);
for decompression.

Not rocket science, but the decompress aspect of the bzip2codec is 
poorly documented so I thought I'd send along.

On 08/21/2013 02:00 PM, Zac Shepherd wrote:
> Hello,
>
> I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a
> m/r job over a bz2 compressed file (18G).  Since splitting support
> wasn't added until 0.21.0, a single mapper is getting allocated and will
> take far too long to complete.  Is there a way that I can decompress the
> file in place, or am I going to have to copy it down, decompress it
> locally, and then copy it back up to the cluster?
>
> Thanks for any help,
> Zac Shepherd


Mime
View raw message