hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Kanter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8202) DefaultAMSProcessor should properly check units of requested custom resource types against minimum/maximum allocation
Date Tue, 08 May 2018 17:39:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467731#comment-16467731
] 

Robert Kanter commented on YARN-8202:
-------------------------------------

Thanks for the patch.  A few comments:
# Why is {{TestResourceRequest}} necessary?  Can't we just directly use {{Resource}}?  
# I don't think we need to make the variables ALL CAPS in {{RMContainerAllocator}}.  While
these are final, they're lists and we usually only do ALL CAPS for "real" constants (e.g.
Strings, ints, etc).
# In {{UnitsConversionUtil#checkUnitArgument}}, the check for a null unit also unnecessarily
checks that the unit is known, which is then repeated in the second if statement.  As it is
now (and the original code suffered the same issue), it's going to throw a message saying
the unit is null for an unknown unit.  We should remove the unknown unit check from the first
if statement.
# In {{ResourceTypesTestHelper}}, we should make the {{Pattern}} a static class variable,
so we don't have to "compile" it everytime the method is called.
# In {{SchedulerUtils#checkResourceRequestAgainstAvailableResource}}, when doing the debug
logging, we should call {{LOG.isDebugEnabled()}} first so we avoid doing String concatenation
if debug logging is not turned on.

> DefaultAMSProcessor should properly check units of requested custom resource types against
minimum/maximum allocation
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8202
>                 URL: https://issues.apache.org/jira/browse/YARN-8202
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Blocker
>         Attachments: YARN-8202-001.patch, YARN-8202-002.patch, YARN-8202-003.patch, YARN-8202-004.patch,
YARN-8202-005.patch, YARN-8202-006.patch, YARN-8202-007.patch, YARN-8202-008.patch
>
>
>  
> When I execute a pi job with arguments: 
> {code:java}
> -Dmapreduce.map.resource.memory-mb=200 -Dmapreduce.map.resource.resource1=500M 1 1000{code}
> and I have one node with 5GB of resource1, I get the following exception on every second
and the job hangs:
> {code:java}
> 2018-04-24 08:42:03,694 INFO org.apache.hadoop.ipc.Server: IPC Server handler 20 on 8030,
call Call#386 Retry#0 org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from
172.31.119.172:58138
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request,
requested resource type=[resource1] < 0 or greater than maximum allowed allocation. Requested
resource=<memory:200, vCores:1, resource1: 500M>, maximum allowed allocation=<memory:6144,
vCores:8, resource1: 5G>, please note that maximum allowed allocation is calculated by
scheduler based on maximum resource of registered NodeManagers, which might be less than configured
maximum allocation=<memory:8192, vCores:8192, resource1: 9223372036854775807G>
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:286)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:242)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:258)
>         at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:249)
>         at org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:230)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>         at org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>         at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
>         at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>         at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> {code}
> *This is because org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils#validateResourceRequest
does not take resource units into account.*
>  
> However, if I start a job with arguments: 
> {code:java}
> -Dmapreduce.map.resource.memory-mb=200 -Dmapreduce.map.resource.resource1=1G 1 1000{code}
> and I still have 5GB of resource1 on one node then the job runs successfully.
>  
> I also tried a third job run, when I request 1GB of resource1 and I have no nodes with
any amount of resource1, then I restart the node with 5GBs of resource1, the job ultimately
completes, but just after the node with enough resources registered in RM, which is the desired
behaviour.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message