hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-128) Resurrect RM Restart
Date Fri, 16 Nov 2012 15:58:15 GMT

    [ https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498876#comment-13498876
] 

Bikas Saha commented on YARN-128:
---------------------------------

@Arinto
Thanks for using the code!
1) Yes. Both are the same object. But that is what the test is testing. That the context that
got saved in the store is the same as the one the app was submitted with. We are doing this
with an in memory store that lets us examine the stored data and compare it with the real
data. A real store would save this the data. So comparison is not possible.
3) Yes. It seems incorrect to store scheduler side-effects. e.g. upon restart if the scheduler
config make minimum container size = 512 then again it will not match.
I am attaching a patch for a ZK store that you can try. It applies on top of the current full
patch.

@Tom
Thanks for reviewing!
1) There is no race condition because the Dispatcher has not been started yet and hence the
attempt start event has not been processed. There is a comment to that effect in the code.
2) I agree. I had thought about it too. But it looks like the current behavior (before this
patch) does this because it does not differentiate killed/failed attempts when deciding that
the attempt retry limit has been reached. So I thought about leaving it for a separate jira
which would be unrelated to this. Once that is done this code could use it and not count the
restarted attempt. This patch is already huge. Does that sound good?
3) Yes. That could be done. The constructor makes it easier to write tests without mangling
configs.
                
> Resurrect RM Restart 
> ---------------------
>
>                 Key: YARN-128
>                 URL: https://issues.apache.org/jira/browse/YARN-128
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch, restart-12-11-zkstore.patch, RM-recovery-initial-thoughts.txt,
RMRestartPhase1.pdf, YARN-128.full-code.3.patch, YARN-128.full-code-4.patch, YARN-128.new-code-added.3.patch,
YARN-128.new-code-added-4.patch, YARN-128.old-code-removed.3.patch, YARN-128.old-code-removed.4.patch,
YARN-128.patch
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM refactor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message