hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haijun Cao" <hai...@kindsight.net>
Subject RE: compressed/encrypted file
Date Wed, 04 Jun 2008 22:45:17 GMT

Mile, Thanks.

"If your inputs to maps are compressed, then you don't get any automatic
assignment of mappers to your data:  each gzipped file gets assigned a
mapper." <--- this is the case I am talking about.

Haijun


-----Original Message-----
From: milesosb@gmail.com [mailto:milesosb@gmail.com] On Behalf Of Miles
Osborne
Sent: Wednesday, June 04, 2008 3:07 PM
To: core-user@hadoop.apache.org
Subject: Re: compressed/encrypted file

You can compress / decompress at many points:

--prior to mapping

--after mapping

--after reducing

(I've been experimenting with all these options; we have been crawling
blogs
every day since Feb and we store on DFS compressed sets of posts)

If your inputs to maps are compressed, then you don't get any automatic
assignment of mappers to your data:  each gzipped file gets assigned a
mapper.

But otherwise, it is all pretty transparent.

Miles

2008/6/4 Haijun Cao <haijun@kindsight.net>:

>
> If a file is compressed and encrypted, then is it still possible to
split
> it and run mappers in parallel?
>
> Do people compress their files stored in hadoop? If yes, how do you go
> about processing them in parallel?
>
> Thanks
> Haijun
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland,
with registration number SC005336.

Mime
View raw message