hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsuyoshi OZAWA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4326) Resurrect RM Restart
Date Wed, 08 Aug 2012 08:04:11 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430935#comment-13430935

Tsuyoshi OZAWA commented on MAPREDUCE-4326:

> So there may not be the need to store any state as long as the RM can recover the current
state of the cluster from the NM's in a reasonable amount of time. 

It's good idea to avoid saving recoverable states without storing. It's uncertain that it
can be recoverable in a reasonable amount of time, so prototyping is needed.

> The only state that needs to be save, as far as I can see, is the information about all
jobs that are not yet completed. 

I agree with you. I'll check whether the states of WIP jobs is defined correctly or not.

> Also, the implementation seems to be doing blocking calls to ZK etc and will likely end
up being a bottleneck on RM threads/perf if a lot of state information needs to be synced
to stable store.

I think, to avoid being the bottleneck, RM should have a dedicated thread to save the states
of RM. The main thread can send the requests of saving the states to the dedicated thread
without blocking by using queue or something. Using async APIs to save the states is also
effective, however, the code can get complicated.
> Resurrect RM Restart 
> ---------------------
>                 Key: MAPREDUCE-4326
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4326
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch
> We should resurrect 'RM Restart' which we disabled sometime during the RM refactor.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message