hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3940) Application moveToQueue should check NodeLabel permission
Date Thu, 25 Aug 2016 03:22:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436200#comment-15436200

Sunil G commented on YARN-3940:

Sorry for pitching in late.

I thought of discussing a scenario here. I know [~leftnoteasy] and [~bibinchundatt] were trying
to make simpler implementation before clear usecases are available.
But one scenario popped up while I was checking a recent issue from Jason YARN-5540. Assume
an application's AM containers was given a specific node label and it got allocated. Now moveToQueue
is performed, eventhough there are no pending resource requests on AM container any more,
I think as per current impl will block move queue. This is because {{requestedPartitions}}
is not cleared with its ANY requests.  
So I have 2 doubts:
- AM container allocated on label1. Now label1 is no more used by app1. Can app1 be moved
to a queue which does not *label1* as its labels.
- There were some out standing requests for a label called *label2*. But app doesnt have any
more outstanding there. This means that there are no containers for this app is running on
*label2*. Do we need to block this app from moving to another queue which does nt have *label2*

I think its better to note this case for now some where so that we can track it later. [~bibinchundatt]
[~Naganarasimha Garla] thoughts?

> Application moveToQueue should check NodeLabel permission 
> ----------------------------------------------------------
>                 Key: YARN-3940
>                 URL: https://issues.apache.org/jira/browse/YARN-3940
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>         Attachments: 0001-YARN-3940.patch, 0002-YARN-3940.patch, 0003-YARN-3940.patch,
0004-YARN-3940.patch, 0005-YARN-3940.patch, 0006-YARN-3940.patch, YARN-3940.0007.patch, YARN-3940.0008.patch
> Configure capacity scheduler 
> Configure node label an submit application {{queue=A Label=X}}
> Move application to queue {{B}} and x is not having access
> {code}
> 2015-07-20 19:46:19,626 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Application attempt appattempt_1437385548409_0005_000001 released container container_e08_1437385548409_0005_01_000002
on node: host: host-10-19-92-117:64318 #containers=1 available=<memory:2560, vCores:15>
used=<memory:512, vCores:1> with event: KILL
> 2015-07-20 19:46:20,970 WARN org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Invalid resource ask by application appattempt_1437385548409_0005_000001
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request,
queue=b1 doesn't have permission to access all labels in resource request. labelExpression
of resource request=x. Queue labels=y
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106)
>         at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:515)
>         at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>         at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)
> {code}
> Same exception will be thrown till *heartbeat timeout*
> Then application state will be updated to *FAILED*

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message