hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1027) Implement RMHAProtocolService
Date Thu, 12 Sep 2013 05:25:57 GMT

    [ https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765167#comment-13765167
] 

Bikas Saha commented on YARN-1027:
----------------------------------

Could you please share the different scenarios that have been tried out. This will help everyone
else following the jira.

Stopped instead of Stopping?
{code}
+    STANDBY("standby"),
+    STOPPING("stopping");
{code}
Since this is a change in common, this has to be in its own jira filed under common. Probably
reviewed by someone from HDFS to make sure we will not inadvertently break HDFS HA somewhere
because of it. We can commit YARN-1027 independent of that jira with state==Initializing for
now and so we are not blocked by it.

We would like to be resilient to future changes in transitionToStandby() logic that may get
missed from serviceStop() and so it might be better to call transitionToStandby() inside serviceStop().
Can we modify transitionToStandby to accept a stop flag such that if that flag is true then
it does not init services again and changes state to Stopped. OR something on those lines.

{code}
 public synchronized void serviceStop() throws Exception {
+    // Stop all services
+    rm.stopActiveServices();
+    haState = HAServiceState.STOPPING;
{code}

Create a startActiveServices() method similar to stopActiveServices() ?
{code}
+    LOG.info("Transitioning to active");
+    rm.activeServices.start();
{code}

creating a new cluster time stamp should be when the RM transitions to active, right? Not
when it transitions to standby.
{code}
+  void createAndInitActiveServices() throws Exception {
+    // reset cluster timestamp
+    clusterTimeStamp = System.currentTimeMillis();
{code}

Should createAndInit/Start/Stop methods in RM be synchronized? Can they race with other activity
in the RM happening on the dispatcher thread?

Was getClusterTimeStamp() addition necessary? Its good to keep refactorings separate.

Incomplete comment
{code}
+    // 6. Stop the RM. All services should
{code}

We do need some e2e tests that test the changes in more detail. Its fine to do that in a separate
jira. The new unit tests in this jira are sufficient for the purposes of this jira IMO.
                
> Implement RMHAProtocolService
> -----------------------------
>
>                 Key: YARN-1027
>                 URL: https://issues.apache.org/jira/browse/YARN-1027
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: test-yarn-1027.patch, yarn-1027-1.patch, yarn-1027-2.patch, yarn-1027-3.patch,
yarn-1027-4.patch, yarn-1027-5.patch, yarn-1027-6.patch, yarn-1027-including-yarn-1098-3.patch,
yarn-1027-in-rm-poc.patch
>
>
> Implement existing HAServiceProtocol from Hadoop common. This protocol is the single
point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message