Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 57440 invoked from network); 21 Aug 2009 06:25:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Aug 2009 06:25:09 -0000 Received: (qmail 43159 invoked by uid 500); 21 Aug 2009 06:25:25 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 43071 invoked by uid 500); 21 Aug 2009 06:25:25 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 43061 invoked by uid 99); 21 Aug 2009 06:25:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Aug 2009 06:25:25 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.yahoo.com) (69.147.107.21) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Aug 2009 06:25:13 +0000 Received: from [192.168.1.64] (snvvpn1-10-73-152-c174.hq.corp.yahoo.com [10.73.152.174]) by mrout2-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id n7L6NY5i099872 for ; Thu, 20 Aug 2009 23:23:34 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:from:to:in-reply-to:content-type: content-transfer-encoding:mime-version:subject:date:references:x-mailer; b=oFIOin5JxVfmFKuFhd6iLsylmTCdY+UyWuE28TsiNj93B4WHbefd+5wLYOEW/pV8 Message-Id: From: Arun C Murthy To: common-user@hadoop.apache.org In-Reply-To: <73d592f60908202120l7264f494q3ee465d4d5c2775@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Subject: Re: MR job scheduler Date: Thu, 20 Aug 2009 23:23:33 -0700 References: <73d592f60908200900h121f42bbp8777991e45afaf22@mail.gmail.com> <65EB723B-ABAF-4E14-BBFC-581B2C2DBC61@yahoo-inc.com> <017701ca2274$79929240$6cb7b6c0$@com> <73d592f60908202120l7264f494q3ee465d4d5c2775@mail.gmail.com> X-Mailer: Apple Mail (2.936) X-Virus-Checked: Checked by ClamAV on apache.org On Aug 20, 2009, at 9:20 PM, bharath vissapragada wrote: > OK i'll be a bit more specific , > > Suppose map outputs 100 different keys . > > Consider a key "K" whose correspoding values may be on N diff > datanodes. > Consider a datanode "D" which have maximum number of values . So > instead of > moving the values on "D" > to other systems , it is useful to bring in the values from other > datanodes > to "D" to minimize the data movement and > also the delay. Similar is the case with All the other keys . How > does the > scheduler take care of this ? Map-Reduce doesn't 'bring' values from N datanodes to the map. A map gets a single block of data to work with, N-1 other maps get the other N-1 blocks; thus multiple maps might get the key K and different values. Eventually the output of the maps i.e. K and values land up at one of the reduces (based on the Partitioner). Please read some of the widely available map-reduce literature for more details. Arun