Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-issues@hadoop.apache.org
Date: Wed, 8 Aug 2012 08:04:11 +0000 (UTC)
From: "Tsuyoshi OZAWA (JIRA)" <jira@apache.org>
To: mapreduce-issues@hadoop.apache.org
Message-ID: <1784279133.3441.1344413051819.JavaMail.jiratomcat@issues-vm>
In-Reply-To: <510220190.51347.1339128383136.JavaMail.jiratomcat@issues-vm>
Subject: [jira] [Commented] (MAPREDUCE-4326) Resurrect RM Restart
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/MAPREDUCE-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430935#comment-13430935 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-4326:
-------------------------------------------

> So there may not be the need to store any state as long as the RM can recover the current state of the cluster from the NM's in a reasonable amount of time. 

It's good idea to avoid saving recoverable states without storing. It's uncertain that it can be recoverable in a reasonable amount of time, so prototyping is needed.

> The only state that needs to be save, as far as I can see, is the information about all jobs that are not yet completed. 

I agree with you. I'll check whether the states of WIP jobs is defined correctly or not.

> Also, the implementation seems to be doing blocking calls to ZK etc and will likely end up being a bottleneck on RM threads/perf if a lot of state information needs to be synced to stable store.

I think, to avoid being the bottleneck, RM should have a dedicated thread to save the states of RM. The main thread can send the requests of saving the states to the dedicated thread without blocking by using queue or something. Using async APIs to save the states is also effective, however, the code can get complicated.
                
> Resurrect RM Restart 
> ---------------------
>
>                 Key: MAPREDUCE-4326
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4326
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM refactor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira