Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 689DC91C1 for ; Thu, 19 Jul 2012 13:44:37 +0000 (UTC) Received: (qmail 98922 invoked by uid 500); 19 Jul 2012 13:44:37 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 98760 invoked by uid 500); 19 Jul 2012 13:44:36 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 98726 invoked by uid 99); 19 Jul 2012 13:44:35 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jul 2012 13:44:35 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id CF932142850 for ; Thu, 19 Jul 2012 13:44:34 +0000 (UTC) Date: Thu, 19 Jul 2012 13:44:34 +0000 (UTC) From: "Jason Lowe (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <321292908.75568.1342705474852.JavaMail.jiratomcat@issues-vm> In-Reply-To: <1763433277.74474.1342675594863.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (MAPREDUCE-4460) Refresh queue throws IO exception after configuring wrong queue capacity MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418288#comment-13418288 ] Jason Lowe commented on MAPREDUCE-4460: --------------------------------------- The issue occurs because new queue instances are created and registered with the metrics system during the first refreshQueues before the exception is thrown. The new queue instances are discarded due to the bad configuration, but they are left registered with the metrics system. When the refreshQueues occurs the second time it creates new queue instances again, since the old ones were discarded. When the new instances try to register with the metrics system, the metrics system sees a duplicate registration, throws an exception, and spoils the refreshQueues. We need to deregister any newly created queue instances from the metrics system when recovering from a bad configuration. > Refresh queue throws IO exception after configuring wrong queue capacity > ------------------------------------------------------------------------ > > Key: MAPREDUCE-4460 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4460 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 2.1.0-alpha > Reporter: Nishan Shetty > > Scenario: > 1.My setup has a,b queues(each with capacity say 50%) under root queue > 2.Start the process > 3.Add one more queue 'c' under root > 4.Configure some capacity for 'c' such that total capacity of a,b,c is not equal to 100 > 5.Now do refresh queues, it will throw exception as wrong capacity(This is expected as capacity was not equal to 100). > 6.Now reconfigure queue capacities of a,b,c such that total capacity is 100 > 5.Now do refresh queues again > Observed that it throws IO exception > {noformat} > java.io.IOException: Failed to re-init queues > at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:216) > at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:174) > at org.apache.hadoop.yarn.server.resourcemanager.api.impl.pb.service.RMAdminProtocolPBServiceImpl.refreshQueues(RMAdminProtocolPBServiceImpl.java:62) > at org.apache.hadoop.yarn.proto.RMAdminProtocol$RMAdminProtocolService$2.callBlockingMethod(RMAdminProtocol.java:122) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) > Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=c already exists! > at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126) > at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107) > at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:216) > at org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.forQueue(QueueMetrics.java:129) > at org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.forQueue(QueueMetrics.java:119) > at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:136) > at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:313) > at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:328) > at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:246) > at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:213) > ... 11 more > at LocalTrace: > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Failed to re-init queues > at org.apache.hadoop.yarn.factories.impl.pb.YarnRemoteExceptionFactoryPBImpl.createYarnRemoteException(YarnRemoteExceptionFactoryPBImpl.java:50) > at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:40) > at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:184) > at org.apache.hadoop.yarn.server.resourcemanager.api.impl.pb.service.RMAdminProtocolPBServiceImpl.refreshQueues(RMAdminProtocolPBServiceImpl.java:62) > at org.apache.hadoop.yarn.proto.RMAdminProtocol$RMAdminProtocolService$2.callBlockingMethod(RMAdminProtocol.java:122) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) > Caused by: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Metrics source QueueMetrics,q0=root,q1=c already exists! > at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.getCause(YarnRemoteExceptionPBImpl.java:94) > at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.getCause(YarnRemoteExceptionPBImpl.java:32) > at java.lang.Throwable.printStackTrace(Throwable.java:514) > at org.apache.hadoop.yarn.exceptions.YarnRemoteException.printStackTrace(YarnRemoteException.java:48) > at org.apache.hadoop.util.StringUtils.stringifyException(StringUtils.java:69) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1715) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira