hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Mapper runs only on one machine
Date Tue, 16 Nov 2010 19:42:51 GMT

On Wed, Nov 17, 2010 at 12:29 AM,  <praveen.peddi@nokia.com> wrote:
> Thanks for the suggestion. This is an important piece of information many
> people will miss since compressed format is a more logical way of passing
> the data. Not sure if this is documented on Hadoop but I could not find it.

The problem is with the gzip algorithm itself. gzip cannot decompress
starting from a random point in a file (its not block compressed, if
you compare it to lzo).

There was some work done for enabling gzip splits to happen too, much
like how lzo splitting is done via the indexing, but its not been
active for a while now. See MAPREDUCE-491 and HADOOP-6153 for the

Harsh J

View raw message