Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0E5F9DD12 for ; Thu, 16 May 2013 06:09:18 +0000 (UTC) Received: (qmail 6137 invoked by uid 500); 16 May 2013 06:09:13 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 5869 invoked by uid 500); 16 May 2013 06:09:12 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 5851 invoked by uid 99); 16 May 2013 06:09:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 May 2013 06:09:12 +0000 X-ASF-Spam-Status: No, hits=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Nikhil.Agarwal@netapp.com designates 216.240.18.38 as permitted sender) Received: from [216.240.18.38] (HELO mx1.netapp.com) (216.240.18.38) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 May 2013 06:09:07 +0000 X-IronPort-AV: E=Sophos;i="4.87,681,1363158000"; d="scan'208";a="258503455" Received: from smtp2.corp.netapp.com ([10.57.159.114]) by mx1-out.netapp.com with ESMTP; 15 May 2013 23:08:44 -0700 Received: from vmwexceht05-prd.hq.netapp.com (vmwexceht05-prd.hq.netapp.com [10.106.77.35]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id r4G68iuf010907 for ; Wed, 15 May 2013 23:08:44 -0700 (PDT) Received: from SACEXCMBX01-PRD.hq.netapp.com ([169.254.2.208]) by vmwexceht05-prd.hq.netapp.com ([10.106.77.35]) with mapi id 14.03.0123.003; Wed, 15 May 2013 23:08:44 -0700 From: "Agarwal, Nikhil" To: "user@hadoop.apache.org" Subject: RE: Map Tasks do not obey data locality principle........ Thread-Topic: Map Tasks do not obey data locality principle........ Thread-Index: Ac5QkWcV9ppaf2q7RI2I/01hA0HsdABWJPgAAARqIjA= Date: Thu, 16 May 2013 06:08:43 +0000 Message-ID: <7B0D51053A50034199FF706B2513104F09C5A605@SACEXCMBX01-PRD.hq.netapp.com> References: <7B0D51053A50034199FF706B2513104F09C59287@SACEXCMBX01-PRD.hq.netapp.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.53] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org No, it does not. I have kept the granularity at file level rather than a b= lock. I do not think that should affect the mapping of tasks ? Regards, Nikhil=20 -----Original Message----- From: Harsh J [mailto:harsh@cloudera.com]=20 Sent: Thursday, May 16, 2013 2:31 AM To: Subject: Re: Map Tasks do not obey data locality principle........ Also, does your custom FS report block locations in the exact same format a= s how HDFS does? On Tue, May 14, 2013 at 4:25 PM, Agarwal, Nikhil wrote: > Hi, > > > > I have a 3-node cluster, with JobTracker running on one machine and=20 > TaskTrackers on other two (say, slave1 and slave2). Instead of using=20 > HDFS, I have written my own FileSystem implementation. Since, unlike=20 > HDFS I am unable to provide a shared filesystem view to JobTrackers=20 > and TaskTracker thus, I mounted the root container of slave2 on a=20 > directory in slave1 (nfs mount). By doing this I am able to submit MR=20 > job to JobTracker, with input path as my_scheme://slave1_IP:Port/dir1,=20 > etc. MR runs successfully but what happens is that data locality is=20 > not ensured i.e. if files A,B,C are kept on > slave1 and D,E,F on slave2 then according to data locality, map tasks=20 > should be submitted such that map task of A,B,C are submitted to=20 > TaskTracker running on slave1 and D,E,F on slave2. Instead of this, it=20 > randomly schedules the map task to any of the tasktrackers. If map=20 > task of file A is submitted to TaskTracker running on slave2 then it=20 > implies that file A is being fetched over the network by slave2. > > > > How do I avoid this from happening? > > > > Thanks, > > Nikhil > > > > -- Harsh J