hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-370) CapacityScheduler app submission fails when min alloc size not multiple of AM size
Date Tue, 05 Feb 2013 20:09:13 GMT

     [ https://issues.apache.org/jira/browse/YARN-370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhijie Shen updated YARN-370:
-----------------------------

    Attachment: YARN-370-branch-2.patch

I've tested the change, with which ContainerManagerImpl saw the updated the resource, i.e.,
2048 mem. It should fix exception.

The user define 1.5G for AM container, such that we need to update the resource of ApplicationSubmissionContext
according to the real allocated size. One remaining issue is whether other containers automatically
created by the system will be assigned the memory size which is not the multiple of the min
alloc size or not. If it will, the problem will happen on the non-AMcontainer as well.
                
> CapacityScheduler app submission fails when min alloc size not multiple of AM size
> ----------------------------------------------------------------------------------
>
>                 Key: YARN-370
>                 URL: https://issues.apache.org/jira/browse/YARN-370
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.0.3-alpha
>            Reporter: Thomas Graves
>            Assignee: Zhijie Shen
>            Priority: Blocker
>         Attachments: YARN-370-branch-2.patch
>
>
> I was running 2.0.3-SNAPSHOT with the capacity scheduler configured with minimum allocation
size 1G. The AM size was set to 1.5G. I didn't specify resource calculator so it was using
DefaultResourceCalculator.  The am launch failed with the error below:
> Application application_1359688216672_0001 failed 1 times due to Error launching appattempt_1359688216672_0001_000001.
Got exception: RemoteTrace: at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
RemoteTrace: at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
Unauthorized request to start container. Expected resource <memory:2048, vCores:1> but
found <memory:1536, vCores:1> at org.apache.hadoop.yarn.factories.impl.pb.YarnRemoteExceptionFactoryPBImpl.createYarnRemoteException(YarnRemoteExceptionFactoryPBImpl.java:39)
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:47) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:383)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:400)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:68)
at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731) at java.security.AccessController.doPrivileged(Native
Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at
org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:123)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:109)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:111)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:255)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722) . Failing the application. 
> It looks like the launchcontext for the app didn't have the resources rounded up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message