hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Agarwal, Nikhil" <Nikhil.Agar...@netapp.com>
Subject RE: Map Tasks do not obey data locality principle........
Date Thu, 16 May 2013 06:08:43 GMT
No, it does not.  I have kept the granularity at file level rather than a block. I do not think
that should affect the mapping of tasks ?

Regards,
Nikhil 

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, May 16, 2013 2:31 AM
To: <user@hadoop.apache.org>
Subject: Re: Map Tasks do not obey data locality principle........

Also, does your custom FS report block locations in the exact same format as how HDFS does?

On Tue, May 14, 2013 at 4:25 PM, Agarwal, Nikhil <Nikhil.Agarwal@netapp.com> wrote:
> Hi,
>
>
>
> I  have a 3-node cluster, with JobTracker running on one machine and 
> TaskTrackers on other two (say, slave1 and slave2). Instead of using 
> HDFS, I have written my own FileSystem implementation. Since, unlike 
> HDFS I am unable to provide a shared filesystem view to JobTrackers 
> and TaskTracker thus, I mounted the root container of slave2 on a 
> directory in slave1 (nfs mount). By doing this I am able to submit MR 
> job to JobTracker, with input path as my_scheme://slave1_IP:Port/dir1, 
> etc.  MR runs successfully but what happens is that data locality is 
> not ensured i.e. if files A,B,C are kept on
> slave1 and D,E,F on slave2 then according to data locality, map tasks 
> should be submitted such that map task of A,B,C are submitted to 
> TaskTracker running on slave1 and D,E,F on slave2. Instead of this, it 
> randomly schedules the map task to any of the tasktrackers. If map 
> task of file A is submitted to TaskTracker running on slave2 then it 
> implies that file A is being fetched over the network by slave2.
>
>
>
> How do I avoid this from happening?
>
>
>
> Thanks,
>
> Nikhil
>
>
>
>



--
Harsh J

Mime
View raw message