hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-1220) Yarn App recovers when it should not as delete failed from rm fs store
Date Thu, 19 Sep 2013 00:34:52 GMT

     [ https://issues.apache.org/jira/browse/YARN-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arpit Gupta updated YARN-1220:
------------------------------

    Summary: Yarn App recovers when it should not as delete failed from rm fs store  (was:
Yarn RM fs state store should handle safemode exceptions)
    
> Yarn App recovers when it should not as delete failed from rm fs store
> ----------------------------------------------------------------------
>
>                 Key: YARN-1220
>                 URL: https://issues.apache.org/jira/browse/YARN-1220
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Arpit Gupta
>            Assignee: Vinod Kumar Vavilapalli
>
> {code}
> ons: 0
> 2013-09-18 05:41:13,542 ERROR recovery.RMStateStore (RMStateStore.java:handleStoreEvent(490))
- Error removing app: application_1379482521108_0003
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
Cannot delete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1379482521108_0003.
> Name node is in safe mode.
> The reported blocks 1018 has reached the threshold 1.0000 of total blocks 1018. The number
of live datanodes 5 has reached the minimum number 0. Safe mode will be turned off automatically
in 20 seconds.
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3124)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3083)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3067)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:697)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:491)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Clien
> {code}
> The issue here is that in case namenode is in safemode while we are interacting with
fs state store we wont be able to update the status. In this particular case the app was never
removed from the store and upon rm restart the app was recovered when it did not need to be.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message