hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fang Xie (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4892) Job will be hung and can not be finished after resource manager restart and enable recovery
Date Tue, 29 Mar 2016 13:50:25 GMT

     [ https://issues.apache.org/jira/browse/YARN-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Fang Xie updated YARN-4892:
---------------------------
    Description: 
Enable resourcemanager recovery, set properties as below:
<property>
    <description>Enable RM to recover state after starting. If true, then
    yarn.resourcemanager.store.class must be specified. </description>
   <name>yarn.resourcemanager.recovery.enabled</name>
   <value>true</value>
</property>
<property>
    <description> </description>
    <name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore</value>
</property>
<property>
    <description> </description>
    <name>yarn.resourcemanager.fs.state-store.uri</name>
    <value>hdfs://apple02:9000/rmstore</value>
</property>

run a distributedshell job, when job running, kill resourcemanager, and then restart resourcemanager,
this job can not be finished and will be hung.


> Job will be hung and can not be finished after resource manager restart and enable recovery
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-4892
>                 URL: https://issues.apache.org/jira/browse/YARN-4892
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Fang Xie
>            Priority: Critical
>
> Enable resourcemanager recovery, set properties as below:
> <property>
>     <description>Enable RM to recover state after starting. If true, then
>     yarn.resourcemanager.store.class must be specified. </description>
>    <name>yarn.resourcemanager.recovery.enabled</name>
>    <value>true</value>
> </property>
> <property>
>     <description> </description>
>     <name>yarn.resourcemanager.store.class</name>
> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore</value>
> </property>
> <property>
>     <description> </description>
>     <name>yarn.resourcemanager.fs.state-store.uri</name>
>     <value>hdfs://apple02:9000/rmstore</value>
> </property>
> run a distributedshell job, when job running, kill resourcemanager, and then restart
resourcemanager, this job can not be finished and will be hung.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message