hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: Combiner Execution
Date Tue, 22 Oct 2013 16:02:30 GMT

I'll answer your questions in reverse.

> According to http://developer.yahoo.com/hadoop/tutorial/module4.html the output is already
combined over all Mappers in a node. But we can not find how this is happening. Can someone
point us to where this combiner is executed?

You'll find the Combiner runner somewhere buried inside MapTask.java,
hunt for the combinerRunner in there.

The Combiner only combines the output of a single map-task (after
sorting). This kicks in only if there are spills in that 1 map-task >

It does not do any cross-task actions and the MR framework (as it is
today) doesn't leave enough room for scheduling a cross-task activity
(i.e MR is strictly bi-partite).

> For a class project my group and I are looking to experiment with combining the output
from Mappers on the same node or in the same rack. We found the idea at http://wiki.apache.org/hadoop/HadoopResearchProjects.

Your general idea is sort of chalked out in Apache Tez
(per-host/per-rack multi-level combiner trees, which is designed to be
more flexible with its plumbing) -


NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

View raw message