hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arinto Murdopo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-128) Resurrect RM Restart
Date Thu, 15 Nov 2012 08:36:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497843#comment-13497843

Arinto Murdopo commented on YARN-128:

Based on the YARN-128.full-code-4.patch, I have these following observations:

1) In TestRMRestart.java Line 78, app1 and appState refer to the same instance because we
are using memory to store the states (MemoryRMStateStore). Therefore, the assert result will
always be True. 

2) ApplicationState is stored when we invoke MockRM's submitApp method. More precisely, it
is in ClientRMService class, line 266. The state that we store contains the resource request
from client. In this case, the value of resource request is 200. However, if we wait for some
time, the value will be updated to 1024 (which is the normalized value given by the Scheduler).

3)Currently our school project is trying to persist the state in persistent storage, and the
assert statement in our modified test class returns error since our storage stores the resource
value before updated by the scheduler.

Based on above observations, should we update the persisted memory value with the new value
assigned by scheduler?
Since we are going to restart both ApplicationMaster and NodeManager when there is failure
in ResourceManager, I think the answer is no, we can use the original value requested by user.
But I'm not really sure with my own reasoning.. soo.. please comment on it. :) . If the answer
is yes, then we should wait until Scheduler updates the resource value before persisting it
into the storage.
> Resurrect RM Restart 
> ---------------------
>                 Key: YARN-128
>                 URL: https://issues.apache.org/jira/browse/YARN-128
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch, RM-recovery-initial-thoughts.txt, RMRestartPhase1.pdf,
YARN-128.full-code.3.patch, YARN-128.full-code-4.patch, YARN-128.new-code-added.3.patch, YARN-128.new-code-added-4.patch,
YARN-128.old-code-removed.3.patch, YARN-128.old-code-removed.4.patch, YARN-128.patch
> We should resurrect 'RM Restart' which we disabled sometime during the RM refactor.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message