hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Georgi Ivanov <iva...@vesseltracker.com>
Subject Bzip2 files as an input to MR job
Date Mon, 22 Sep 2014 14:40:54 GMT
Hi guys,
I would like to compress the files on HDFS to save some storage.

As far as i see bzip2 is the only format which is splitable (and slow).

The actual files are Avro.

So in my driver class i have :


I have number of jobs running processing Avro files so i would like to 
keep the code change to a minimum.

Is it possible to comrpess these avro files with bzip2 and keep the code 
of MR jobs the same (or with little change)
If it is , please give me some hints as so far i don't seem to find any 
good resources on the Internet.


View raw message