hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled
Date Mon, 08 Sep 2014 19:28:29 GMT

    [ https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125959#comment-14125959
] 

Jian He commented on YARN-2459:
-------------------------------

bq. Add one in TestRMRestart to get an app rejected and make sure that the final-status gets
recorded
Added.
bq. Another one in RMStateStoreTestBase to ensure it is okay to have an updateApp call without
a storeApp call like in this case.
Turns out RMStateStoreTestBase already has this test.
{code}
    // test updating the state of an app/attempt whose initial state was not
    // saved.
{code}

> RM crashes if App gets rejected for any reason and HA is enabled
> ----------------------------------------------------------------
>
>                 Key: YARN-2459
>                 URL: https://issues.apache.org/jira/browse/YARN-2459
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>         Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, YARN-2459.4.patch
>
>
> If RM HA is enabled and used Zookeeper store for RM State Store.
> If for any reason Any app gets rejected and directly goes to NEW to FAILED
> then final transition makes that to RMApps and Completed Apps memory structure but that
doesn't make it to State store.
> Now when RMApps default limit reaches it starts deleting apps from memory and store.
In that case it try to delete this app from store and fails which causes RM to crash.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message