hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Injun Joe <ll_oz...@yahoo.com.hk>
Subject Re: successive mappers
Date Fri, 15 Apr 2011 20:41:03 GMT
The problem with doing all of them in a single mapper is that some map instances 
may not require further processing. So if I try to do everything in a single 
mapper instance, I will have a lot of cpus lying idle while others take the 

From: Robert Evans <evans@yahoo-inc.com>
To: "mapreduce-user@hadoop.apache.org" <mapreduce-user@hadoop.apache.org>
Sent: Fri, April 15, 2011 4:20:14 PM
Subject: Re: successive mappers


Take a look at the Multiple output format classes


Is a good example.  You should be able to create a custom output format class 
that matches your needs.  Although, if all you are doing is map processing then 
why are you outputting intermediate results instead of processing them all in a 
single mapper?  It should be a lot faster if you don’t need the intermediate 

--Bobby Evans

On 4/15/11 2:05 PM, "Injun Joe" <ll_oz_ll@yahoo.com.hk> wrote:

>I am coding a map-reduce program which involves several map-reduce steps. The 
>work that my program does is only in the mapper, so I was thinking to have no 
>reduce steps but successive mappers. The logic can be written like this for 
>mappers at iteration 0 and 1:
>1. Take input.
>2. Map 0:
>   Determine if a key-value pair satisfies condition C.
>    - If it satisfies condition then output the key-value pair to a file in 
>directory E.
>    - If it does not then transform key-value pair and output the key-value pair 
>to directory D.
>3. Map 1:
>   - Change input directory to directory D
>   - Perform same steps as map 0.
>So, the problem is that I have not been able to find a way to output key-value 
>pairs to different directories. All I have been able to specify is the map 
>output directory by TextOutputFormat.setOutputPath.
>Any help would be appreciated.
>Thanks a lot
View raw message