hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Félix López <jaaaelpum...@gmail.com>
Subject What's the best way to compress a folder in hadoop?
Date Fri, 29 Jun 2012 07:36:31 GMT
The folder contains files with text and other folders with text files. The
text is not key/value, it's just text. Something like this:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dumm...

I'm thinking about 3 options:

First. To use Hadoop Streaming as it's proposed here
http://stackoverflow.com/questions/7153087/hadoop-compress-file-in-hdfs by
Jeff Wu

Second. To use a custom map/reduce task. Using as a map the IdentityMapper
and a custom reducer that creates the zip file, but i'm not sure if in the
reducer I'll have  info about the parent folders, maybe with a custom
mapper. Something similar to
https://github.com/flopezluis/testing-hadoop/blob/master/src/pruebas/Reduce.java

Third option is to create a new Hdfs command to zip in hadoop, but i'm not
sure whether hadoop distributes the execution, because otherwise it may
takes a long time and very cpu consuming.

Any ideas?

Thanks

Mime
View raw message