hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1481) ResourceManager and AdminService interact in a convoluted manner after YARN-1318
Date Mon, 09 Dec 2013 00:27:07 GMT

    [ https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842810#comment-13842810
] 

Karthik Kambatla commented on YARN-1481:
----------------------------------------

Thanks [~vinodkv] for the refactoring. The code reads better now.

As for the TODO items in the patch:
# HAServiceStatus returned by getServiceStatus() has an isReadyToBecomeActive method which
is used by the FailoverController to check if an RM is in a state to become Active. This corresponds
to Active and Standby states, as the RM can not be made Active when it is in the INITIALIZING
or STOPPING states. If I have missed the question entirely, please let me know.
{code}
+    if (isRMActive() || haState == HAServiceProtocol.HAServiceState.STANDBY) { // TODO: What
?!
{code}
# Previously, AdminService was altering the Configuration and Bikas had suggested we set the
RM's conf in case any one decides to make a copy of the Configuration in AdminService. Long
story short, it can be removed.
{code}
+    if (this.rmContext.isHAEnabled()) {
+      HAUtil.verifyAndSetConfiguration(conf);
+      setConf(conf); // TODO: Why is this needed? Remove.
+    }
{code}
# I am not sure why we need a lock there in the first place. It should be safe to remove it
here, as transitionToStandby itself is synchronized.
{code}
+        synchronized(rm) { // TODO: Lock which class?
+          if (rmContext.isHAEnabled()) {
             try {
               // Transition to standby and reinit active services
               LOG.info("Transitioning RM to Standby mode");
-              adminService.transitionToStandby(true);
+              rm.transitionToStandby(true);
{code}

Comments on the patch itself: 
# The field haState in ResourceManager can be private. Looks like we forgot to make it private
along with other changes.
# {{RMContextImpl#setHAServiceState}} should be called everywhere haState is set in the RM,
AdminService depends on this. Looks like the test failures are also because of this. 
# Given RMContext has the haState, it might be better to remove it altogether from ResourceManager
and use the rmContext version.
# get and set HAState in RMContextImpl should be synchronized on haState before setting or
getting haState

> ResourceManager and AdminService interact in a convoluted manner after YARN-1318
> --------------------------------------------------------------------------------
>
>                 Key: YARN-1481
>                 URL: https://issues.apache.org/jira/browse/YARN-1481
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: YARN-1481-20131207.txt
>
>
> This is something I found while reviewing YARN-1318, but didn't halt that patch as many
cycles went there already. Some top level issues
>  - Not easy to follow RM's service life cycle
>     -- RM adds only AdminService as its service directly.
>     -- Other services are added to RM when AdminService's init calls RM.activeServices.init()
>  - Overall, AdminService shouldn't encompass all of RM's HA state management. It was
originally supposed to be the implementation of just the RPC server.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message