Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 619CF9918 for ; Wed, 15 May 2013 20:50:11 +0000 (UTC) Received: (qmail 31532 invoked by uid 500); 15 May 2013 20:50:06 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 31414 invoked by uid 500); 15 May 2013 20:50:06 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 31407 invoked by uid 99); 15 May 2013 20:50:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 May 2013 20:50:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sandy.ryza@cloudera.com designates 209.85.220.43 as permitted sender) Received: from [209.85.220.43] (HELO mail-pa0-f43.google.com) (209.85.220.43) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 May 2013 20:50:02 +0000 Received: by mail-pa0-f43.google.com with SMTP id hz10so1873244pad.16 for ; Wed, 15 May 2013 13:49:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=9eMUFBEhF46Rn36hBE3Fc6moN9DKP/HYp6OiLTx+HZ0=; b=V74gVhDEfH8lqd5+BMJv37SWq/468625wuf8EKX/jNDMXoJMv2cc8ZpywYcupiqDWF 7VWt4M2aFm7DNqPqRB2CXR5qkEfeRvcDrhK7XZnIkY3oQHwlSlVxncXBN2I5Xu+g5Gs5 HuuYdx76PmoQ4X8qsxuNwkdV2OYp6vExi0IUPB+Gff4x3UHp2riIVwdMrF6b6NOeKWZ4 CTm071UJKQvZjL+hqiOOjsCC17yeSvyzFBxhYSyq2rr4v+EFhMJ5057YRhy997FMkSYf Ps3+WqxPS4Vd5h3B9/Wd0FGqc5ZlukFcUbjds9TitNiV1nAQVNa3OyowZ3rLyJT8YdAq ZpWg== MIME-Version: 1.0 X-Received: by 10.68.164.98 with SMTP id yp2mr39704138pbb.214.1368650981522; Wed, 15 May 2013 13:49:41 -0700 (PDT) Received: by 10.70.95.10 with HTTP; Wed, 15 May 2013 13:49:41 -0700 (PDT) In-Reply-To: <7B0D51053A50034199FF706B2513104F09C59287@SACEXCMBX01-PRD.hq.netapp.com> References: <7B0D51053A50034199FF706B2513104F09C59287@SACEXCMBX01-PRD.hq.netapp.com> Date: Wed, 15 May 2013 13:49:41 -0700 Message-ID: Subject: Re: Map Tasks do not obey data locality principle........ From: Sandy Ryza To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b10cbbd9366bf04dcc7e350 X-Gm-Message-State: ALoCoQnlG1l4d0mM13MsPtnY922g3ew5m4T9z5gg9lIvzgAnbOSPT+oFZQ64/+uoh2auVLfaqiYw X-Virus-Checked: Checked by ClamAV on apache.org --047d7b10cbbd9366bf04dcc7e350 Content-Type: text/plain; charset=ISO-8859-1 Hi Nikhil, Which scheduler are you using? -Sandy On Tue, May 14, 2013 at 3:55 AM, Agarwal, Nikhil wrote: > Hi,**** > > ** ** > > I have a 3-node cluster, with JobTracker running on one machine and > TaskTrackers on other two (say, slave1 and slave2). Instead of using HDFS, > I have written my own FileSystem implementation. Since, unlike HDFS I am > unable to provide a shared filesystem view to JobTrackers and TaskTracker > thus, I mounted the root container of slave2 on a directory in slave1 (nfs > mount). By doing this I am able to submit MR job to JobTracker, with input > path as my_scheme://slave1_IP:Port/dir1, etc. MR runs successfully but > what happens is that data locality is not ensured i.e. if files A,B,C are > kept on slave1 and D,E,F on slave2 then according to data locality, map > tasks should be submitted such that map task of A,B,C are submitted to > TaskTracker running on slave1 and D,E,F on slave2. Instead of this, it > randomly schedules the map task to any of the tasktrackers. If map task of > file A is submitted to TaskTracker running on slave2 then it implies that > file A is being fetched over the network by slave2.**** > > ** ** > > How do I avoid this from happening?**** > > ** ** > > Thanks,**** > > Nikhil**** > > ** ** > > ** ** > --047d7b10cbbd9366bf04dcc7e350 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Nikhil,

Which scheduler are you usin= g?

-Sandy


On Tue, May 14, 2013 at 3:55 AM, Aga= rwal, Nikhil <Nikhil.Agarwal@netapp.com> wrote:

Hi,

=A0

I=A0 have a 3-node cluster, with JobTracker running = on one machine and TaskTrackers on other two (say, slave1 and slave2). Inst= ead of using HDFS, I have written my own FileSystem implementation. Since, = unlike HDFS I am unable to provide a shared filesystem view to JobTrackers and TaskTracker thus, I mounted the = root container of slave2 on a directory in slave1 (nfs mount). By doing thi= s I am able to submit MR job to JobTracker, with input path as my_scheme://= slave1_IP:Port/dir1, etc. =A0MR runs successfully but what happens is that data locality is not ensured i.e. if= files A,B,C are kept on slave1 and D,E,F on slave2 then according to data = locality, map tasks should be submitted such that map task of A,B,C are sub= mitted to TaskTracker running on slave1 and D,E,F on slave2. Instead of this, it randomly schedules the map= task to any of the tasktrackers. If map task of file A is submitted to Tas= kTracker running on slave2 then it implies that file A is being fetched ove= r the network by slave2.

=A0

How do I avoid this from happening?

=A0

Thanks,

Nikhil

=A0

=A0


--047d7b10cbbd9366bf04dcc7e350--