hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4460) Refresh queue throws IO exception after configuring wrong queue capacity
Date Thu, 19 Jul 2012 13:44:34 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418288#comment-13418288
] 

Jason Lowe commented on MAPREDUCE-4460:
---------------------------------------

The issue occurs because new queue instances are created and registered with the metrics system
during the first refreshQueues before the exception is thrown.  The new queue instances are
discarded due to the bad configuration, but they are left registered with the metrics system.
 When the refreshQueues occurs the second time it creates new queue instances again, since
the old ones were discarded.  When the new instances try to register with the metrics system,
the metrics system sees a duplicate registration, throws an exception, and spoils the refreshQueues.

We need to deregister any newly created queue instances from the metrics system when recovering
from a bad configuration.
                
> Refresh queue throws IO exception after configuring wrong queue capacity
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4460
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4460
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.1.0-alpha
>            Reporter: Nishan Shetty
>
> Scenario:
> 1.My setup has a,b queues(each with capacity say 50%) under root queue
> 2.Start the process
> 3.Add one more queue 'c' under root
> 4.Configure some capacity for 'c' such that total capacity of a,b,c is not equal to 100
> 5.Now do refresh queues, it will throw exception as wrong capacity(This is expected as
capacity was not equal to 100).
> 6.Now reconfigure queue capacities of a,b,c such that total capacity is 100
> 5.Now do refresh queues again
> Observed that it throws IO exception
> {noformat}
> java.io.IOException: Failed to re-init queues
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:216)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:174)
>         at org.apache.hadoop.yarn.server.resourcemanager.api.impl.pb.service.RMAdminProtocolPBServiceImpl.refreshQueues(RMAdminProtocolPBServiceImpl.java:62)
>         at org.apache.hadoop.yarn.proto.RMAdminProtocol$RMAdminProtocolService$2.callBlockingMethod(RMAdminProtocol.java:122)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
> Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=c
already exists!
>         at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
>         at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
>         at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:216)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.forQueue(QueueMetrics.java:129)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.forQueue(QueueMetrics.java:119)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:136)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:313)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:328)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:246)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:213)
>         ... 11 more
>  at LocalTrace:
>         org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Failed to
re-init queues
>         at org.apache.hadoop.yarn.factories.impl.pb.YarnRemoteExceptionFactoryPBImpl.createYarnRemoteException(YarnRemoteExceptionFactoryPBImpl.java:50)
>         at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:40)
>         at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:184)
>         at org.apache.hadoop.yarn.server.resourcemanager.api.impl.pb.service.RMAdminProtocolPBServiceImpl.refreshQueues(RMAdminProtocolPBServiceImpl.java:62)
>         at org.apache.hadoop.yarn.proto.RMAdminProtocol$RMAdminProtocolService$2.callBlockingMethod(RMAdminProtocol.java:122)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
> Caused by: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Metrics
source QueueMetrics,q0=root,q1=c already exists!
>         at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.getCause(YarnRemoteExceptionPBImpl.java:94)
>         at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.getCause(YarnRemoteExceptionPBImpl.java:32)
>         at java.lang.Throwable.printStackTrace(Throwable.java:514)
>         at org.apache.hadoop.yarn.exceptions.YarnRemoteException.printStackTrace(YarnRemoteException.java:48)
>         at org.apache.hadoop.util.StringUtils.stringifyException(StringUtils.java:69)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1715)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message