hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Map Tasks do not obey data locality principle........
Date Wed, 15 May 2013 21:01:02 GMT
Also, does your custom FS report block locations in the exact same
format as how HDFS does?

On Tue, May 14, 2013 at 4:25 PM, Agarwal, Nikhil
<Nikhil.Agarwal@netapp.com> wrote:
> Hi,
> I  have a 3-node cluster, with JobTracker running on one machine and
> TaskTrackers on other two (say, slave1 and slave2). Instead of using HDFS, I
> have written my own FileSystem implementation. Since, unlike HDFS I am
> unable to provide a shared filesystem view to JobTrackers and TaskTracker
> thus, I mounted the root container of slave2 on a directory in slave1 (nfs
> mount). By doing this I am able to submit MR job to JobTracker, with input
> path as my_scheme://slave1_IP:Port/dir1, etc.  MR runs successfully but what
> happens is that data locality is not ensured i.e. if files A,B,C are kept on
> slave1 and D,E,F on slave2 then according to data locality, map tasks should
> be submitted such that map task of A,B,C are submitted to TaskTracker
> running on slave1 and D,E,F on slave2. Instead of this, it randomly
> schedules the map task to any of the tasktrackers. If map task of file A is
> submitted to TaskTracker running on slave2 then it implies that file A is
> being fetched over the network by slave2.
> How do I avoid this from happening?
> Thanks,
> Nikhil

Harsh J

View raw message