hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: How to send KV pair to a reduce task on a particular machine?
Date Mon, 22 Mar 2010 03:06:07 GMT
Yanfeng,

The sort of behavior you want is intentionally omitted from MapReduce's
capabilities. Reduce partitions are kept as abstract notions and your
MapReduce program cannot bind partitions to particular physical nodes. This
is for fault-tolerance purposes. If machine1 crashes, then partition1 can
still be rescheduled onto machine3 and the computation can continue.

Sorry that's frustrating for your use case!
- Aaron

On Fri, Mar 5, 2010 at 8:50 PM, Yanfeng Zhang <zhangyf14@gmail.com> wrote:

> Hi, all
>
> The KV pairs (kv1, kv2, kv3 kv4) out from mapper would be partitioned into
> R
> parts (e.g. R=2) by a partitioner. For example, kv1 and kv2 are in
> partition1, while kv3 and kv4 are in partition2, the reducers will get KV
> pairs from these two partitions, reducer1 get KV pairs from partition1 and
> reducer2 get KV pairs from partition2.
>
> I want to let machine1 get KV pairs from partition1 and machine2 get KV
> pairs from partition2. But reducer1 is not always on machine1, reducer2 is
> not always on machine2. Is there any way to make sure kv1 and kv2 are sent
> to machine1 and kv3, kv4 are sent to machine2?
>
> Thank you in advance!
>
> Sincerely,
> --
> Yanfeng Zhang
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message