hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: can we split a big gzipped file on HDFS ?
Date Thu, 23 Jun 2011 21:34:57 GMT

If it is just gzipped then no.  gzip does not allow for splitting as you cannot seek to an
arbitrary point in the file and then after, possibly moving to a sync point, start reading
out the data.  If it is a sequence file with gzip compression then yes, because the sequence
file format only compresses the file in chunks, not the entire file at once.

--Bobby Evans

On 6/23/11 1:21 AM, "Mapred Learn" <mapred.learn@gmail.com> wrote:

If I have a big gzipped text file (~ 60 GB) in HDFS, can i split it into smaller chunks (~
1 GB) so that I can run a map-red job on those files and finish faster than running job on
1 big file ?


View raw message