hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Loddengaard <a...@cloudera.com>
Subject Re: parititioning dataset
Date Mon, 05 Jul 2010 18:16:02 GMT
Hi there,

Unfortunately you can't control which mapper gets what data.  The InputSplit
-> map task assignment is random.  You could, however, do the join in the
reduce, by using an intermediate key as your join key.

Does that make sense?


On Sat, Jul 3, 2010 at 9:28 AM, Denim Live <denim.live@yahoo.com> wrote:

> Hello everyone,
> I have written my custom partitioner for partitioning datasets. I want to
> partition two datasets using the same partitioner and then in the next
> mapreduce job, I want each mapper to handle the same partition from the two
> sources and perform some function such as joining etc. How I can I ensure
> that one mapper gets the split that corresponds to same partition from both
> the sources?
> Any help would be highly appreciated.
> Alex

View raw message