hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: Map Tasks do not obey data locality principle........
Date Wed, 15 May 2013 20:49:41 GMT
Hi Nikhil,

Which scheduler are you using?

-Sandy


On Tue, May 14, 2013 at 3:55 AM, Agarwal, Nikhil
<Nikhil.Agarwal@netapp.com>wrote:

>  Hi,****
>
> ** **
>
> I  have a 3-node cluster, with JobTracker running on one machine and
> TaskTrackers on other two (say, slave1 and slave2). Instead of using HDFS,
> I have written my own FileSystem implementation. Since, unlike HDFS I am
> unable to provide a shared filesystem view to JobTrackers and TaskTracker
> thus, I mounted the root container of slave2 on a directory in slave1 (nfs
> mount). By doing this I am able to submit MR job to JobTracker, with input
> path as my_scheme://slave1_IP:Port/dir1, etc.  MR runs successfully but
> what happens is that data locality is not ensured i.e. if files A,B,C are
> kept on slave1 and D,E,F on slave2 then according to data locality, map
> tasks should be submitted such that map task of A,B,C are submitted to
> TaskTracker running on slave1 and D,E,F on slave2. Instead of this, it
> randomly schedules the map task to any of the tasktrackers. If map task of
> file A is submitted to TaskTracker running on slave2 then it implies that
> file A is being fetched over the network by slave2.****
>
> ** **
>
> How do I avoid this from happening?****
>
> ** **
>
> Thanks,****
>
> Nikhil****
>
> ** **
>
> ** **
>

Mime
View raw message