Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-issues@hadoop.apache.org
Date: Thu, 19 Jul 2012 13:44:34 +0000 (UTC)
From: "Jason Lowe (JIRA)" <jira@apache.org>
To: mapreduce-issues@hadoop.apache.org
Message-ID: <321292908.75568.1342705474852.JavaMail.jiratomcat@issues-vm>
In-Reply-To: <1763433277.74474.1342675594863.JavaMail.jiratomcat@issues-vm>
Subject: [jira] [Commented] (MAPREDUCE-4460) Refresh queue throws IO
 exception after configuring wrong queue capacity
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/MAPREDUCE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418288#comment-13418288 ] 

Jason Lowe commented on MAPREDUCE-4460:
---------------------------------------

The issue occurs because new queue instances are created and registered with the metrics system during the first refreshQueues before the exception is thrown.  The new queue instances are discarded due to the bad configuration, but they are left registered with the metrics system.  When the refreshQueues occurs the second time it creates new queue instances again, since the old ones were discarded.  When the new instances try to register with the metrics system, the metrics system sees a duplicate registration, throws an exception, and spoils the refreshQueues.

We need to deregister any newly created queue instances from the metrics system when recovering from a bad configuration.
                
> Refresh queue throws IO exception after configuring wrong queue capacity
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4460
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4460
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.1.0-alpha
>            Reporter: Nishan Shetty
>
> Scenario:
> 1.My setup has a,b queues(each with capacity say 50%) under root queue
> 2.Start the process
> 3.Add one more queue 'c' under root
> 4.Configure some capacity for 'c' such that total capacity of a,b,c is not equal to 100
> 5.Now do refresh queues, it will throw exception as wrong capacity(This is expected as capacity was not equal to 100).
> 6.Now reconfigure queue capacities of a,b,c such that total capacity is 100
> 5.Now do refresh queues again
> Observed that it throws IO exception
> {noformat}
> java.io.IOException: Failed to re-init queues
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:216)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:174)
>         at org.apache.hadoop.yarn.server.resourcemanager.api.impl.pb.service.RMAdminProtocolPBServiceImpl.refreshQueues(RMAdminProtocolPBServiceImpl.java:62)
>         at org.apache.hadoop.yarn.proto.RMAdminProtocol$RMAdminProtocolService$2.callBlockingMethod(RMAdminProtocol.java:122)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
> Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=c already exists!
>         at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
>         at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
>         at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:216)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.forQueue(QueueMetrics.java:129)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.forQueue(QueueMetrics.java:119)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:136)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:313)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:328)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:246)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:213)
>         ... 11 more
>  at LocalTrace:
>         org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Failed to re-init queues
>         at org.apache.hadoop.yarn.factories.impl.pb.YarnRemoteExceptionFactoryPBImpl.createYarnRemoteException(YarnRemoteExceptionFactoryPBImpl.java:50)
>         at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:40)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:184)
>         at org.apache.hadoop.yarn.server.resourcemanager.api.impl.pb.service.RMAdminProtocolPBServiceImpl.refreshQueues(RMAdminProtocolPBServiceImpl.java:62)
>         at org.apache.hadoop.yarn.proto.RMAdminProtocol$RMAdminProtocolService$2.callBlockingMethod(RMAdminProtocol.java:122)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
> Caused by: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Metrics source QueueMetrics,q0=root,q1=c already exists!
>         at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.getCause(YarnRemoteExceptionPBImpl.java:94)
>         at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.getCause(YarnRemoteExceptionPBImpl.java:32)
>         at java.lang.Throwable.printStackTrace(Throwable.java:514)
>         at org.apache.hadoop.yarn.exceptions.YarnRemoteException.printStackTrace(YarnRemoteException.java:48)
>         at org.apache.hadoop.util.StringUtils.stringifyException(StringUtils.java:69)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1715)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira