hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: successive mappers
Date Fri, 15 Apr 2011 20:20:14 GMT
I,

Take a look at the Multiple output format classes

http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleTextOutputFormat.html

Is a good example.  You should be able to create a custom output format class that matches
your needs.  Although, if all you are doing is map processing then why are you outputting
intermediate results instead of processing them all in a single mapper?  It should be a lot
faster if you don't need the intermediate results.

--Bobby Evans

On 4/15/11 2:05 PM, "Injun Joe" <ll_oz_ll@yahoo.com.hk> wrote:

Hi,
I am coding a map-reduce program which involves several map-reduce steps. The work that my
program does is only in the mapper, so I was thinking to have no reduce steps but successive
mappers. The logic can be written like this for mappers at iteration 0 and 1:

1. Take input.
2. Map 0:
   Determine if a key-value pair satisfies condition C.
    - If it satisfies condition then output the key-value pair to a file in directory E.
    - If it does not then transform key-value pair and output the key-value pair to directory
D.
3. Map 1:
   - Change input directory to directory D
   - Perform same steps as map 0.

So, the problem is that I have not been able to find a way to output key-value pairs to different
directories. All I have been able to specify is the map output directory by TextOutputFormat.setOutputPath.

Any help would be appreciated.

Thanks a lot
I



Mime
View raw message