hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1696) Document RM HA
Date Thu, 27 Mar 2014 01:57:17 GMT

    [ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948751#comment-13948751
] 

Vinod Kumar Vavilapalli commented on YARN-1696:
-----------------------------------------------

Tx for the doc, Karthik. Some comments:
 - Like I mentioned, fail-over is a big enough topic in itself and so let's split this into
two two and call this one the ResourceManager fail-over guide. We can have a top level high-availability
doc if we want to and link the two there.
 - Let's move off the state-store and RM restart stuff out.
 - "the applications can resume from their last check-pointed state; e.g. completed map tasks
in a MapReduce job are not re-run on a subsequent attempt" -> This is not related to fail-over.
Let's put it in the restart doc.
 - " Clients, ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the RMsin
a round-robin fashion" -> Or point that we have ConfigFailOverProvider as the default implementation
of an abstraction?
 - I think we should mention that even though there are two state-store impls, the suggested
store is ZK-based store for the sake of fencing.
 - We should also document the client retry related configs.
 - Should we give a very basic example configuration of two RMs? The absolute minimum required
to enable this?

Unrelated to the docs
 - It's late, but after seeing the document, I think we should rename "yarn.resourcemanager.ha."
configs to be "yarn.resourcemanager.failover.". What do others think? Also "rm-ids" is seems
weird too.

> Document RM HA
> --------------
>
>                 Key: YARN-1696
>                 URL: https://issues.apache.org/jira/browse/YARN-1696
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>            Priority: Blocker
>         Attachments: YARN-1696.2.patch, yarn-1696-1.patch
>
>
> Add documentation for RM HA. Marking this a blocker for 2.4 as this is required to call
RM HA Stable and ready for public consumption. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message