Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C1C211026C for ; Mon, 27 Jan 2014 12:18:05 +0000 (UTC) Received: (qmail 49745 invoked by uid 500); 27 Jan 2014 12:17:55 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 49655 invoked by uid 500); 27 Jan 2014 12:17:54 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 49648 invoked by uid 99); 27 Jan 2014 12:17:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jan 2014 12:17:53 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of amitmittal5@gmail.com designates 209.85.216.50 as permitted sender) Received: from [209.85.216.50] (HELO mail-qa0-f50.google.com) (209.85.216.50) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jan 2014 12:17:46 +0000 Received: by mail-qa0-f50.google.com with SMTP id cm18so7186063qab.9 for ; Mon, 27 Jan 2014 04:17:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=dVLHV5zR5tDDRvuSKH8rbeIS2U84R/LUap6iyyS/mKI=; b=hkOf9QLjUkbVCEpMRkxTThTSUM7DFbU/vKMn73AbxVmMsvUh56M5PeNrvYtqcgtmNh u+aPGUC3Ql0ZTkscl8uhyP2ocYj4wfoNNtjFbFz+TAfbIFGiuK0NQVoiQUnUmKDtuy9W y54k83v3JfeUreIV9yoF44Lfhqs82Lm/2nsvffQeRDRK/o0dkf7nQEwb+6ArQSERFQC6 DtKNmNZxJ+/gpt/uXJQsZv9EePvA/dwVPFBuqUqEWky3DiVDOYWhZNBjrJQlYta/niyi fRJkjTL3Ykr43W7mk9zzOzwiNrhaWU9KO+ictipJ6lH4tYjRLLnyodFzsIjk/5N5RhQi Ca+w== MIME-Version: 1.0 X-Received: by 10.229.97.134 with SMTP id l6mr41816420qcn.11.1390825046237; Mon, 27 Jan 2014 04:17:26 -0800 (PST) Received: by 10.140.82.211 with HTTP; Mon, 27 Jan 2014 04:17:26 -0800 (PST) Date: Mon, 27 Jan 2014 17:47:26 +0530 Message-ID: Subject: Does all reducer take input from all NodeManager/Tasktrackers of Map tasks From: Amit Mittal To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11345524d36b6404f0f2b00f X-Virus-Checked: Checked by ClamAV on apache.org --001a11345524d36b6404f0f2b00f Content-Type: text/plain; charset=ISO-8859-1 Hi, Does all reducer take input from all NodeManager/Tasktrackers of Map tasks ? *Reference:* "Hadoop: The Definitive Guide:3rd Ed" book by "Tom White" On page# 210 (Ch 6: How MapReduce Works > Shuffle & Sort > The reducer side) There is a note, here is the text from book: How do reducers know which machines to fetch map output from? ... Therefore, for a given job, the jobtracker (or application master) knows the mapping between map outputs and hosts. A thread in the reducer periodically asks the master for map output hosts until it has retrieved them all. ... *Question 1:* I believe the TaskTracker and then JobTracker/AppMaster will receive the updates through call to Task.statusUpdate(TaskUmbilicalProtocol obj). By which the JobTracker/AM will know the location of the map's o/p file and host details etc, however how it will know what all the partitions or keys this output has. In other words, from the heartbeat, how JobTracker will know about data partitions/keys? It will be required to decide from which Mapper, the mapper's output needs to be pulled or not. *Question 2:* In short, not all reducer takes output from all Mappers, they only connects and takes output related to the keys partitioned for that particular reducer. Thanks Amit Mittal --001a11345524d36b6404f0f2b00f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,

Does all red= ucer take input from all NodeManager/Tasktrackers of Map tasks ?

Reference:=A0&q= uot;Hadoop: The Definitive Guide:3rd Ed" book by "Tom White"=
On page# 21= 0 (Ch 6: How MapReduce Works > Shuffle & Sort > The reducer side)=

There is a note, here is the text from book:=A0
How do reducers know which machines to fetch map output from?
...
Therefore, for a given job, the jobtracker (or applica= tion master) knows the mapping between map outputs and hosts. A thread in t= he reducer periodically asks the master for map output hosts
until it has retrieved them all.
...
Ques= tion 1:=A0I believe the TaskTracker and then JobTracker/AppMaster will = receive the updates through call to Task.statusUpdate(TaskUmbilicalProtocol= obj). By which the JobTracker/AM will know the location of the map's o= /p file and host details etc, however how it will know what all the partiti= ons or keys this output has. In other words, from the heartbeat, how JobTra= cker will know about data partitions/keys? It will be required to decide fr= om which Mapper, the mapper's output needs to be pulled or not.
Question 2:=A0In short, not all reducer takes output from all M= appers, they only connects and takes output related to the keys partitioned= for that particular reducer.

Thanks
Ami= t Mittal
--001a11345524d36b6404f0f2b00f--