hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miles Osborne" <mi...@inf.ed.ac.uk>
Subject Re: compressed/encrypted file
Date Wed, 04 Jun 2008 22:06:42 GMT
You can compress / decompress at many points:

--prior to mapping

--after mapping

--after reducing

(I've been experimenting with all these options; we have been crawling blogs
every day since Feb and we store on DFS compressed sets of posts)

If your inputs to maps are compressed, then you don't get any automatic
assignment of mappers to your data:  each gzipped file gets assigned a

But otherwise, it is all pretty transparent.


2008/6/4 Haijun Cao <haijun@kindsight.net>:

> If a file is compressed and encrypted, then is it still possible to split
> it and run mappers in parallel?
> Do people compress their files stored in hadoop? If yes, how do you go
> about processing them in parallel?
> Thanks
> Haijun

The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message