hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable
Date Tue, 05 Nov 2013 05:56:19 GMT

    [ https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813672#comment-13813672
] 

Bikas Saha commented on YARN-674:
---------------------------------

We were intentionally going through the same submitApplication() method to make sure that
all the initialization and setup code paths are consistently followed in both cases by keeping
the code path identical as much as possible. The RM would submit a recovered application,
in essence proxying a user submitting the application. Its a general pattern followed through
the recovery logic - to be minimally invasive to the mainline code path so that we can avoid
functional bugs as much as possible. Separating them into 2 methods has resulted in code duplication
in both methods without any huge benefit that I can see. It also leave us susceptible to future
code changes made in one code path and not the other.

Why is isSecurityEnabled() being checked at this internal level. The code should not even
reach this point if security is not enabled. It should already be taken care of in the public
apis, right? 
Also why is it calling rmContext.getDelegationTokenRenewer().addApplication(event) instead
of DelegationTokenRenewer.this.addApplication(). Same for rmContext.getDelegationTokenRenewer().applicationFinished(evt);
{code}
@SuppressWarnings("unchecked")
+    private void handleDTRenewerEvent(
+        DelegationTokenRenewerAppSubmitEvent event) {
+      try {
+        // Setup tokens for renewal
+        if (UserGroupInformation.isSecurityEnabled()) {
+          rmContext.getDelegationTokenRenewer().addApplication(event);
+          rmContext.getDispatcher().getEventHandler()
+              .handle(new RMAppEvent(event.getAppicationId(),
+              event.isApplicationRecovered() ? RMAppEventType.RECOVER
+              : RMAppEventType.START));
+        }
+      } catch (Throwable t) {
{code}

These assumptions may make the code brittle to future changes. Also Typo in comments. We should
probably assert that the application state is NEW over here so that the broken assumption
is caught at the source instead of at the destination app causing a state machine crash.................
{code}
+            "Unable to add the application to the delegation token renewer.",
+            t);
+        // Sending APP_REJECTED is fine, since we assume that the
+        // RMApp is in NEW state and thus we havne't yet informed the
+        // Scheduler about the existence of the application
+        rmContext.getDispatcher().getEventHandler().handle(
+            new RMAppRejectedEvent(event.getAppicationId(), t.getMessage()));
+      }
{code}

typo
{code}
 public ApplicationId getAppicationId() {
{code}

@Private + @VisibleForTesting???
{code}
+  //Only for Testing
+  public int getInProcessDelegationTokenRenewerEventsCount() {
+    return this.renewerCount.get();
+  }
{code}

Can DelegationTokenRenewerAppSubmitEvent event objects have an event type different from VERIFY_AND_START_APPLICATION?
If not, we dont need this check and we can change the constructor of DelegationTokenRenewerAppSubmitEvent
to not expect an event type argument. It should set the VERIFY_AND_START_APPLICATION within
the constructor.
{code}
+      if (evt.getType().equals(
+          DelegationTokenRenewerEventType.VERIFY_AND_START_APPLICATION)
+          && evt instanceof DelegationTokenRenewerAppSubmitEvent) {
{code}

Rename DelegationTokenRenewerThread to not have misleading Thread in the name ?

Why is this warning not happening for other services? Whats special in the code for DelegationTokenRenewer?
{code}
+  <!-- Ignore Synchronization issues as they are never going to occur for
+       methods like serviceInit(), serviceStart() and handle() -->
+  <Match>
+    <Class name="org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer"
/>
+    <Bug pattern="IS2_INCONSISTENT_SYNC" />
+  </Match>
{code}

> Slow or failing DelegationToken renewals on submission itself make RM unavailable
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-674
>                 URL: https://issues.apache.org/jira/browse/YARN-674
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Omkar Vinit Joshi
>         Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, YARN-674.4.patch,
YARN-674.5.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look like RM
is unavailable as it may run out of RPC handlers due to blocked client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message