hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Combine previous Map Results
Date Sat, 26 Apr 2008 00:01:05 GMT


You can only have one map function.

But that function can decide which sort of thing to do based on which input
it is given.  That allows input of type A to be processed with map funtion
f_a and input of type B to be processed with map function f_b.




On 4/25/08 4:43 PM, "Dina Said" <dinasaid@gmail.com> wrote:

> Thanks Joydeep for your reply.
> 
> But is there a possibility to have two or more Map tasks and a single
> reduce task?
> I want the reduce task to work on all the intermediate keys produced
> from these Map tasks.
> 
> I am sorry I am a new baby in Map-Reduce but from my first reading:
> I can see that we can define only one Map task
> 
> Thanks
> Dina
> 
> 
> Joydeep Sen Sarma wrote:
>> if one weren't thinking about performance - then the second map-reduce task
>> would have to process both the data sets (the intermediate data and the new
>> data). For the existing intermediate data - you want to do an identity map
>> and for the new data - whatever map logic you have. u can write a mapper that
>> can decide the map logic based on the input file name (look for the jobconf
>> variable map.input.file in Java - or the environment variable map_input_file
>> in hadoop streaming).
>> 
>> if one were thinking about performance - then one would argue that re-sorting
>> the existing intermediate data (as would happen in the simple solution) is
>> pointless (it's already sorted by the desired key). if this is a concern -
>> the only thing that's available right now (afaik) is a feature described in
>> hadoop-2085. (you would have to map-reduce the new data set only and then
>> join the old and new data using map-side joins described in this jira - this
>> would require a third map-reduce task).
>> 
>> 
>> (one could argue that if there was an option to skip map-side sorting on a
>> per-file level - that would be perfect. one would skip map-side sorts of the
>> old data and only sort the new data - and the reducer would merge the two).
>> 
>> 
>> -----Original Message-----
>> From: Dina Said [mailto:dinasaid@gmail.com]
>> Sent: Sat 4/19/2008 1:55 PM
>> To: core-user@hadoop.apache.org
>> Subject: Combine previous Map Results
>>  
>> Dear all
>> 
>> Suppose that I have files that have intermediate key values and I want
>> to combine these intermediate keys values with a new MapReduce task. I
>> want this MapReduce task to combine during the reduce stage the
>> intermediate key values it generates with the intermediate key values I
>> already have.
>> 
>> Any ideas?
>> 
>> Dina
>> 
>> 
>>   
> 


Mime
View raw message