hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@yahoo-inc.com>
Subject Re: HoD and locality of TaskTrackers to data (on DataNodes)
Date Mon, 24 Mar 2008 03:56:59 GMT
> Hi,
> I have a question about using HoD and the locality of the assigned
> TaskTrackers to the data.
> Suppose I have a long-running HDFS installation with
> TaskTrackers/JobTracker nodes dynamically allocated by HoD, and I
> uploaded my data to HDFS prior to running my job/allocating nodes
> using "dfs -put". Then, I allocate some nodes and run my job on that
> data using HoD. Would the nodes allocated by HoD take into account the
> HDFS nodes on which my data resides (e.g. by looking at which
> DataNodes hold blocks that belong to the current user)? If the nodes
> are just arbitrarily allocated, doesn't that break Hadoop's design
> principle of having processing take place near the data?
> And if HoD doesn't currently take block location into account when
> allocating nodes, are there future plans for that to be incorporated?
Excellent point ! HOD does not currently take this into account.  We are 
working on ways in which we can accomplish this using configuration 
outside HOD (i.e. in Torque / some Hadoop features in 0.17 like 
HADOOP-1985). I will update this list (and possibly also documentation) 
on how this can be setup, after we have some more concrete results.


View raw message