hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1412) Allocating Containers on a particular Node in Yarn
Date Thu, 21 Nov 2013 00:36:35 GMT

    [ https://issues.apache.org/jira/browse/YARN-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828340#comment-13828340
] 

Bikas Saha commented on YARN-1412:
----------------------------------

What is the value of the following configuration? 
yarn.scheduler.capacity.node-locality-delay

It looks like you are being hit by a bug that will happen with small number of container requests.

LeafQueue.assignContainersOnNode()
Looks like if rackLocalityDelay is not met, even then the scheduler falls back to off-switch
assignment. The delay calculation for off-switch assignment is basically (#different-locations/#nodes-in-cluster)*#containers
< #node-heartbeats-without-assignment. In your case, if you have 20 nodes in all, (2/20)*1
== 0.1. So the moment we skip 1 node (waiting for locality delay) we end up assigning an off-switch
container to the request.

Try the following, set the node locality delay mentioned at the beginning to the number of
nodes in the cluster. Then instead of asking for 1 container at pri 0, ask for 20 containers,
each for a specific node, rack=false, relax=true. The above off-switch locality delay will
become 20/20*1 == 20 missed assignments.
If you see correct assignments then the above theory is correct about the bug.

Btw, what you are trying to do (node=specific, rack=null and relaxLocality=true) is the default
behavior of existing schedulers. They will always try to relax locality to rack and then off-switch
by default. So you dont need to explicitly code for it. 

> Allocating Containers on a particular Node in Yarn
> --------------------------------------------------
>
>                 Key: YARN-1412
>                 URL: https://issues.apache.org/jira/browse/YARN-1412
>             Project: Hadoop YARN
>          Issue Type: Bug
>         Environment: centos, Hadoop 2.2.0
>            Reporter: gaurav gupta
>
> Summary of the problem: 
>  If I pass the node on which I want container and set relax locality default which is
true, I don't get back the container on the node specified even if the resources are available
on the node. It doesn't matter if I set rack or not.
> Here is the snippet of the code that I am using
> AMRMClient<ContainerRequest> amRmClient =  AMRMClient.createAMRMClient();;
>     String host = "h1";
>     Resource capability = Records.newRecord(Resource.class);
>     capability.setMemory(memory);
>     nodes = new String[] {host};
>     // in order to request a host, we also have to request the rack
>     racks = new String[] {"/default-rack"};
>      List<ContainerRequest> containerRequests = new ArrayList<ContainerRequest>();
>     List<ContainerId> releasedContainers = new ArrayList<ContainerId>();
>     containerRequests.add(new ContainerRequest(capability, nodes, racks, Priority.newInstance(priority)));
>     if (containerRequests.size() > 0) {
>       LOG.info("Asking RM for containers: " + containerRequests);
>       for (ContainerRequest cr : containerRequests) {
>         LOG.info("Requested container: {}", cr.toString());
>         amRmClient.addContainerRequest(cr);
>       }
>     }
>     for (ContainerId containerId : releasedContainers) {
>       LOG.info("Released container, id={}", containerId.getId());
>       amRmClient.releaseAssignedContainer(containerId);
>     }
>     return amRmClient.allocate(0);



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message