Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 39CC4CC19 for ; Fri, 3 Aug 2012 14:34:22 +0000 (UTC) Received: (qmail 26041 invoked by uid 500); 3 Aug 2012 14:34:18 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 25988 invoked by uid 500); 3 Aug 2012 14:34:18 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 25978 invoked by uid 99); 3 Aug 2012 14:34:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2012 14:34:17 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=MSGID_MULTIPLE_AT,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of liyin.liangly@aliyun-inc.com designates 110.75.170.56 as permitted sender) Received: from [110.75.170.56] (HELO out21.biz.aliyun.mail.aliyun.com) (110.75.170.56) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2012 14:34:11 +0000 Received: from LianglyPC(172.18.90.247) by smtp.aliyun-inc.com(127.0.0.1); Fri, 03 Aug 2012 22:33:48 +0800 From: =?gb2312?B?wbrA7tOh?= To: References: In-Reply-To: Subject: =?gb2312?B?tPC4tDogTWFwUmVkdWNlIHNodWZmbGUgcXVlc3Rpb24=?= Date: Fri, 3 Aug 2012 22:33:48 +0800 Message-ID: <001301cd7184$fe88eaa0$fb9abfe0$@liangly@aliyun-inc.com> MIME-Version: 1.0 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Ac1xez4ToRcupKl/RPGCJ/ITx5/BUgACQ6SQ Content-Language: zh-cn X-Virus-Checked: Checked by ClamAV on apache.org When a map task is done, its output is always flushed to the disk and = merged to one file. The benefit is that if the reducer is failed, the map need not to = re-run. Liyin Liang -----=D3=CA=BC=FE=D4=AD=BC=FE----- =B7=A2=BC=FE=C8=CB: Satheesh Kumar [mailto:nkseam@gmail.com]=20 =B7=A2=CB=CD=CA=B1=BC=E4: 2012=C4=EA8=D4=C23=C8=D5 21:23 =CA=D5=BC=FE=C8=CB: common-user@hadoop.apache.org =D6=F7=CC=E2: MapReduce shuffle question Team, can someone please clarify the following question? In the map phase, the map output is written to the local disk. And in = the shuffle phase, the map output partitions are transferred to reduce nodes using http. So, my question is assuming there are no spills (data set is small enough to accommodate this), will the map output be transferred directly from memory to the reduce nodes using http without a disk = access to write the map output? Or, is the map output always flushed to the = disk before transferred to reduce nodes? Appreciate the help. Thanks, Satheesh