hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 梁李印 <liyin.lian...@aliyun-inc.com>
Subject 答复: MapReduce shuffle question
Date Fri, 03 Aug 2012 14:33:48 GMT
When a map task is done, its output is always flushed to the disk and merged
to one file.
The benefit is that if the reducer is failed, the map need not to re-run.

Liyin Liang

发件人: Satheesh Kumar [mailto:nkseam@gmail.com] 
发送时间: 2012年8月3日 21:23
收件人: common-user@hadoop.apache.org
主题: MapReduce shuffle question

Team, can someone please clarify the following question?

In the map phase, the map output is written to the local disk. And in the
shuffle phase, the map output partitions are transferred to reduce nodes
using http. So, my question is assuming there are no spills (data set is
small enough to accommodate this), will the map output be transferred
directly from memory to the reduce nodes using http without a disk access
to write the map output? Or, is the map output always flushed to the disk
before transferred to reduce nodes?

Appreciate the help.


View raw message