hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Parand Darugar" <daru...@yahoo-inc.com>
Subject Re: compressed/encrypted file
Date Thu, 05 Jun 2008 00:04:27 GMT


----- Original Message -----
From: milesosb@gmail.com <milesosb@gmail.com>
To: core-user@hadoop.apache.org <core-user@hadoop.apache.org>
Sent: Wed Jun 04 15:06:42 2008
Subject: Re: compressed/encrypted file

You can compress / decompress at many points:

--prior to mapping

--after mapping

--after reducing

(I've been experimenting with all these options; we have been crawling blogs
every day since Feb and we store on DFS compressed sets of posts)

If your inputs to maps are compressed, then you don't get any automatic
assignment of mappers to your data:  each gzipped file gets assigned a
mapper.

But otherwise, it is all pretty transparent.

Miles

2008/6/4 Haijun Cao <haijun@kindsight.net>:

>
> If a file is compressed and encrypted, then is it still possible to split
> it and run mappers in parallel?
>
> Do people compress their files stored in hadoop? If yes, how do you go
> about processing them in parallel?
>
> Thanks
> Haijun
>



-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message