hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "rohithsharma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-929) 2 MRAppMaster spawned for same Job Id
Date Tue, 16 Jul 2013 13:18:48 GMT

     [ https://issues.apache.org/jira/browse/YARN-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

rohithsharma updated YARN-929:
------------------------------

    Description: 
Configuration : 
    yarn.resourcemanager.am.max-retries = 3

Scenario is 
    NodeManager is killed forcefully i.e using kill -9 NM_PID.
    After Node expiry , RM killed all the container running in this NodeManager.
    But , MRAppMaster JVM is still running.
    RM spawn the 2nd attempt MRAppMaster since am retry is configured as 3. At this point,
there are 2 MRAppMaster is running parallely for same job Id

Problem from running 2 MRApp is 1st attempt appmaster deletes the job information from hdfs
which cause FileNotFoundException for 2nd attempt MRApp.  
     

  was:
Configuration : 
    yarn.resourcemanager.am.max-retries = 3

Scenario is 
    NodeManager is killed forcefully i.e using kill -9 NM_PID.
    After Node expiry , RM killed all the container running in this NodeManager.
    But , MRAppMaster JVM is still running.
    RM spawn the 2nd attempt MRAppMaster since am retry is configured as 3.

Problem from running 2 MRApp is 1st attempt appmaster deletes the job information from hdfs
which cause FileNotFoundException for 2nd attempt MRApp.  
     

    
> 2 MRAppMaster spawned for same Job Id
> -------------------------------------
>
>                 Key: YARN-929
>                 URL: https://issues.apache.org/jira/browse/YARN-929
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.5-alpha
>            Reporter: rohithsharma
>
> Configuration : 
>     yarn.resourcemanager.am.max-retries = 3
> Scenario is 
>     NodeManager is killed forcefully i.e using kill -9 NM_PID.
>     After Node expiry , RM killed all the container running in this NodeManager.
>     But , MRAppMaster JVM is still running.
>     RM spawn the 2nd attempt MRAppMaster since am retry is configured as 3. At this point,
there are 2 MRAppMaster is running parallely for same job Id
> Problem from running 2 MRApp is 1st attempt appmaster deletes the job information from
hdfs which cause FileNotFoundException for 2nd attempt MRApp.  
>      

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message