hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-128) Resurrect RM Restart
Date Fri, 09 Nov 2012 18:48:14 GMT

    [ https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494202#comment-13494202
] 

Bikas Saha commented on YARN-128:
---------------------------------

Attaching a proposal doc and code for the first iteration. The proposal is in the same lines
as the earlier initial design sketch but limits the first iteration of the work to restarting
the applications after the RM comes back up. The reasoning and ideas are detailed in the doc.

Attaching some code that implements the proposal. It includes a functional test that verifies
the end-to-end scenario using an in-memory store. If everything looks good overral then I
will tie up the loose ends and add more tests.

For review, the code is broken into 1) removal of old code 2) new code + test. There are TODO
comments in the code where folks could make suggestions. The code is attached in full for
a build and test pass on Jenkins because my machine is having long host resolution timeouts.
Any ideas on this?

During the testing I found a bug in the CapacityScheduler because of which it fails to activate
applications when resources are added to the cluster. Folks can comment on the fix. There
is a separate test case that shows the bug and verifies the fix.
                
> Resurrect RM Restart 
> ---------------------
>
>                 Key: YARN-128
>                 URL: https://issues.apache.org/jira/browse/YARN-128
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt, RMRestartPhase1.pdf,
YARN-128-combined.patch, YARN-128.new-code.1.patch, YARN-128.patch, YARN-128.remove-old-code.1.patch
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM refactor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message