hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabio <anyte...@gmail.com>
Subject How is locality really implemented?
Date Sun, 11 Jan 2015 06:03:23 GMT
Hi everyone,
I am trying to understand how locality is actually implemented in 
Hadoop, specifically in the Capacity Scheduler.
I see that an application has to specify the "location" for a request, 
that can be a node, a rack, or ANY (*).
For this reason I want to ask if an application that wants a resource 
request to be considered at higher levels of locality, has to submit 
more than just one request for the same resource (e.g.: if I want my 
request to be considered also at rack and off-switch level, I will issue 
a request specifying the node, one specifying the rack, and one with 
just *, all of them with the relax locality set to true).
This seems to me as a necessity since in the Capacity scheduler code the 
rack local requests (and similarly for the off-switch ones) are obtained 
in a way like this:

application.getResourceRequest(priority, *node.getRackName()*)

while if I submitted a single request just for a specific node, even 
allowing relaxed locality, the scheduler would not be able to process it 
at rack level, since it specifically looks for requests made for the 
current rack (and at switch level too).
Is this correct?
If it is, what's the point of the relax locality parameter? I don't see 
a possible situation when I would have any request with relax locality 
set to false (in particular for rack and off-switch levels). Why would 
an application issue a rack-level request with relax locality set to false?

Thanks in advance

Fabio

Mime
View raw message