hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission
Date Tue, 16 Apr 2013 07:59:16 GMT

     [ https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Zhijie Shen updated YARN-514:

    Attachment: YARN-514.6.patch

Thank @Bikas for your investigation. I've modified the code. The newest patch contain the
following major updates:

1. FAILED => FAILED transition on RMAppEventType.APP_SAVED and KILLED => KILLED transition
on RMAppEventType.APP_SAVED are defined. It fixes the problem pointed by @Bikas.

2. In addition, I found there's a problem in RMApp state transition in the RM restarting scenario.
The stored MRApp will be recovered, an RMApp instance will be created, it will transit to
NEW_SAVING and be stored again with the previous patch. To fix the  problem, "isRecovered"
is defined in RMAppImpl, and is set to true when RMAppImpl#recover is called. Then, on RMAppEventType.START
being received, NEW => NEW_SAVING if the RMApp instance is not recovered, NEW => SUBMITTED

3. Addition test cases are added in TestRMAppTransitions to test the aforementioned transition

4. TestRMRestart should have traced the problem of saving the RMApp instance which is recovered
again.  However, it didn't failed the test case with previous patch because MemoryRMStateStore
didn't throw exceptions when storing a duplicate application/attempt. Therefore, in the newest
patch, MemoryRMStateStore will through IOException when the application/attempt has already
been stored, which is consistent with the behavior of FileSystemRMStateStore. Then, the current
test case of TestRMRestart can trace the problem of saving the RMApp instance twice.
> Delayed store operations should not result in RM unavailability for app submission
> ----------------------------------------------------------------------------------
>                 Key: YARN-514
>                 URL: https://issues.apache.org/jira/browse/YARN-514
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Zhijie Shen
>         Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, YARN-514.4.patch,
YARN-514.5.patch, YARN-514.6.patch
> Currently, app submission is the only store operation performed synchronously because
the app must be stored before the request returns with success. This makes the RM susceptible
to blocking all client threads on slow store operations, resulting in RM being perceived as
unavailable by clients.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message