hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3614) FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
Date Mon, 11 May 2015 04:54:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537558#comment-14537558
] 

Rohith commented on YARN-3614:
------------------------------

YARN-3410 try to remove the application from RMStateStore which is used as RM start up arguments
i.e {{./yarn resourcemanager -remove-application-from-state-store <appId>}}. 

I am wondering about the use case that why someone move this application folder manually??
OTOH, it is better either check for path existence of handle the exception and log WARN message
instead of throwing exception which crashes the RM

> FileSystemRMStateStore throw exception when failed to remove application, that cause
resourcemanager to crash
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3614
>                 URL: https://issues.apache.org/jira/browse/YARN-3614
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: lachisis
>            Priority: Critical
>
> FileSystemRMStateStore is only a accessorial plug-in of rmstore. 
> When it failed to remove application, I think warning is enough, but now resourcemanager
crashed.
> Recently, I configure "yarn.resourcemanager.state-store.max-completed-applications" 
to limit applications number in rmstore. when applications number exceed the limit, some old
applications will be removed. If failed to remove, resourcemanager will crash.
> The following is log: 
> 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
Removing info for app: application_1430994493305_0053
> 2015-05-11 06:58:43,815 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
Removing info for app: application_1430994493305_0053 at: /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
> 2015-05-11 06:58:43,816 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
Error removing app: application_1430994493305_0053
> java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-05-11 06:58:43,819 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED.
Cause:
> java.lang.Exception: Failed to delete /hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1430994493305_0053
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:572)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:471)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:185)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:171)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
>         at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message