hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Heidemann <jo...@isi.edu>
Subject Re: compressed/encrypted file
Date Thu, 05 Jun 2008 16:14:56 GMT
On Wed, 04 Jun 2008 15:52:55 PDT, Arun C Murthy wrote: 
>Haijun,
>
>On Jun 4, 2008, at 3:45 PM, Haijun Cao wrote:
>
>>
>> Mile, Thanks.
>>
>> "If your inputs to maps are compressed, then you don't get any
>> automatic
>> assignment of mappers to your data:  each gzipped file gets assigned a
>> mapper." <--- this is the case I am talking about.
>>
>
>With the current compression codecs available in Hadoop (zlib/gzip/
>lzo) it is not possible to split up a compressed file and then process
>it in a parallel manner. However once we get bzip2 to work we  could
>split up the files as you are describing...


We are actually working on a bzip2 codec, hopefully with split support,
so hopefully something will be here by the end of summer.

   -John Heidemann

Mime
View raw message