hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Urso <anthony.u...@gmail.com>
Subject Re: How to send KV pair to a reduce task on a particular machine?
Date Tue, 30 Mar 2010 00:14:41 GMT
The dirty way to do this is to have the reducer throw an exception if
it receives a key that was not intended for the node it is running on.
 It will be rescheduled on another node, and eventually it will land
on the correct one.

Depending on the total number of nodes and reducers in the job, you
may have to increase the max failed tasks configuration parameter in
order that the whole job does not fail while the reducers are bouncing
around from node to node.


On Fri, Mar 5, 2010 at 8:50 PM, Yanfeng Zhang <zhangyf14@gmail.com> wrote:
> Hi, all
> The KV pairs (kv1, kv2, kv3 kv4) out from mapper would be partitioned into R
> parts (e.g. R=2) by a partitioner. For example, kv1 and kv2 are in
> partition1, while kv3 and kv4 are in partition2, the reducers will get KV
> pairs from these two partitions, reducer1 get KV pairs from partition1 and
> reducer2 get KV pairs from partition2.
> I want to let machine1 get KV pairs from partition1 and machine2 get KV
> pairs from partition2. But reducer1 is not always on machine1, reducer2 is
> not always on machine2. Is there any way to make sure kv1 and kv2 are sent
> to machine1 and kv3, kv4 are sent to machine2?
> Thank you in advance!
> Sincerely,
> --
> Yanfeng Zhang

View raw message