hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dina Said <dinas...@gmail.com>
Subject Re: Combine previous Map Results
Date Fri, 25 Apr 2008 23:43:29 GMT
Thanks Joydeep for your reply.

But is there a possibility to have two or more Map tasks and a single
reduce task?
I want the reduce task to work on all the intermediate keys produced
from these Map tasks.

I am sorry I am a new baby in Map-Reduce but from my first reading:
I can see that we can define only one Map task

Thanks
Dina


Joydeep Sen Sarma wrote:
> if one weren't thinking about performance - then the second map-reduce task would have
to process both the data sets (the intermediate data and the new data). For the existing intermediate
data - you want to do an identity map and for the new data - whatever map logic you have.
u can write a mapper that can decide the map logic based on the input file name (look for
the jobconf variable map.input.file in Java - or the environment variable map_input_file in
hadoop streaming).
>
> if one were thinking about performance - then one would argue that re-sorting the existing
intermediate data (as would happen in the simple solution) is pointless (it's already sorted
by the desired key). if this is a concern - the only thing that's available right now (afaik)
is a feature described in hadoop-2085. (you would have to map-reduce the new data set only
and then join the old and new data using map-side joins described in this jira - this would
require a third map-reduce task).
>
>
> (one could argue that if there was an option to skip map-side sorting on a per-file level
- that would be perfect. one would skip map-side sorts of the old data and only sort the new
data - and the reducer would merge the two).
>
>
> -----Original Message-----
> From: Dina Said [mailto:dinasaid@gmail.com]
> Sent: Sat 4/19/2008 1:55 PM
> To: core-user@hadoop.apache.org
> Subject: Combine previous Map Results
>  
> Dear all
>
> Suppose that I have files that have intermediate key values and I want
> to combine these intermediate keys values with a new MapReduce task. I
> want this MapReduce task to combine during the reduce stage the
> intermediate key values it generates with the intermediate key values I
> already have.
>
> Any ideas?
>
> Dina
>
>
>   


Mime
View raw message