Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A555E547 for ; Tue, 5 Feb 2013 20:09:13 +0000 (UTC) Received: (qmail 39752 invoked by uid 500); 5 Feb 2013 20:09:13 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 39723 invoked by uid 500); 5 Feb 2013 20:09:13 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 39714 invoked by uid 99); 5 Feb 2013 20:09:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 20:09:13 +0000 Date: Tue, 5 Feb 2013 20:09:13 +0000 (UTC) From: "Zhijie Shen (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-370) CapacityScheduler app submission fails when min alloc size not multiple of AM size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-370?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-370: ----------------------------- Attachment: YARN-370-branch-2.patch I've tested the change, with which ContainerManagerImpl saw the updated the= resource, i.e., 2048 mem. It should fix exception. The user define 1.5G for AM container, such that we need to update the reso= urce of ApplicationSubmissionContext according to the real allocated size. = One remaining issue is whether other containers automatically created by th= e system will be assigned the memory size which is not the multiple of the = min alloc size or not. If it will, the problem will happen on the non-AMcon= tainer as well. =20 > CapacityScheduler app submission fails when min alloc size not multiple o= f AM size > -------------------------------------------------------------------------= --------- > > Key: YARN-370 > URL: https://issues.apache.org/jira/browse/YARN-370 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 2.0.3-alpha > Reporter: Thomas Graves > Assignee: Zhijie Shen > Priority: Blocker > Attachments: YARN-370-branch-2.patch > > > I was running 2.0.3-SNAPSHOT with the capacity scheduler configured with = minimum allocation size 1G. The AM size was set to 1.5G. I didn't specify r= esource calculator so it was using DefaultResourceCalculator. The am launc= h failed with the error below: > Application application_1359688216672_0001 failed 1 times due to Error la= unching appattempt_1359688216672_0001_000001. Got exception: RemoteTrace: a= t LocalTrace: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteException= PBImpl: RemoteTrace: at LocalTrace: org.apache.hadoop.yarn.exceptions.impl.= pb.YarnRemoteExceptionPBImpl: Unauthorized request to start container. Expe= cted resource but found at = org.apache.hadoop.yarn.factories.impl.pb.YarnRemoteExceptionFactoryPBImpl.c= reateYarnRemoteException(YarnRemoteExceptionFactoryPBImpl.java:39) at org.a= pache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:47) at org.ap= ache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.a= uthorizeRequest(ContainerManagerImpl.java:383) at org.apache.hadoop.yarn.se= rver.nodemanager.containermanager.ContainerManagerImpl.startContainer(Conta= inerManagerImpl.java:400) at org.apache.hadoop.yarn.api.impl.pb.service.Con= tainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.jav= a:68) at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerServ= ice$2.callBlockingMethod(ContainerManager.java:83) at org.apache.hadoop.ipc= .ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:45= 4) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.ha= doop.ipc.Server$Handler$1.run(Server.java:1735) at org.apache.hadoop.ipc.Se= rver$Handler$1.run(Server.java:1731) at java.security.AccessController.doPr= ivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:41= 5) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma= tion.java:1441) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:172= 9) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)= at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructor= AccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newI= nstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Con= structor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteE= xception.instantiateException(RemoteException.java:90) at org.apache.hadoop= .ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.= apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndTh= rowException(YarnRemoteExceptionPBImpl.java:123) at org.apache.hadoop.yarn.= api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerMan= agerPBClientImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager= .amlauncher.AMLauncher.launch(AMLauncher.java:111) at org.apache.hadoop.yar= n.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:255) at = java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1= 110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecut= or.java:603) at java.lang.Thread.run(Thread.java:722) . Failing the applica= tion.=20 > It looks like the launchcontext for the app didn't have the resources rou= nded up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira