hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject Re: Map Output
Date Fri, 17 Sep 2010 09:48:04 GMT
Hi,

>>As far as I know, the map output is written to the local disk then shipped to reducer
via network. Is this correct?
Yes. Each reducer picks up its own partition from the map output, once the map task completes.
However, its little more complicated (and very interesting) on the map side. In short, the
output of mappers is not directly written to disk, but to a buffer, which is flushed to disk
as a threshold is reached.


>>Does it read from and written to the disk multiple times or only once when the map
task ends?
Once the map task completes, it sends a success to jobtracker via its tasktracker. This success
is propagated to the reducers, who then pick up the respective partitions. So its a one-time
read per mapper/reducer pair. The write on reducer side is not one time, and also depends
on buffers etc. Which bring to the next question you had


>>Which parts of the Hadoop code should I see to understand how it is written to the
local disk and how it is shipped?
http://developer.yahoo.com/hadoop/tutorial/module4.html#closer

In the code, the files MapTask.java (MapOutputBuffer and following in it) and ReduceTask.java(run
method) should provide good pointers. All supporting classes for merging, spilling etc are
in same dir.

Hope this helps,
Amogh


On 9/14/10 9:01 PM, "Yağız Kargın" <xerxes862@gmail.com> wrote:

Hi All,

I have some questions about the map output.

As far as I know, the map output is written to the local disk then
shipped to reducer via network. Is this correct? Does it read from and
written to the disk multiple times or only once when the map task
ends? Which parts of the Hadoop code should I see to understand how it
is written to the local disk and how it is shipped?

Best,
--Yagiz

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message