hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: the part of the intermediate output fed to a reducer
Date Sat, 23 Mar 2013 19:56:43 GMT

On Sun, Mar 24, 2013 at 12:00 AM, preethi ganeshan
<preethiganeshan92@gmail.com> wrote:
> Hey all,
> I am working on project that schedules data local reduce tasks.

Great, are you planning to contribute it upstream too? See
https://issues.apache.org/jira/browse/MAPREDUCE-199. I'm also hoping
you're working on trunk and not the maintenance branch branch-1, which
is very outdated with where MR is today.

> However , i wanted to know if there is a way using MapTask.java to keep track of the
> inputs and size of the input to every reducer. In other words what code do
> i add to get the size of the intermediate output that is fed to a reduce
> task before a reduce task begins.

Change the thinking here a bit: A map does not feed a reduce (i.e. its
not a push). A reduce consumes a map output after its completion (they
map task JVM may terminate for all it cares). Upon a map's completion,
its counters are available at the central (i.e. the ApplicationMaster)
which the reduce task can poll for sizes (it may already be doing

Harsh J

View raw message