hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-540) Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
Date Sat, 14 Sep 2013 04:56:54 GMT

    [ https://issues.apache.org/jira/browse/YARN-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13767371#comment-13767371
] 

Bikas Saha commented on YARN-540:
---------------------------------

{code}
Note: This flag is only needed for RM recovery purpose. If RM recovery is
+ * enabled, user is expected to retry until this flag becomes true.
+ * Otherwise,user will risk restarting an already finished application after RM
+ * restarts.
{code}
How about the following?
The flag indicates whether the application has successfully unregistered and is safe to stop.
The application may stop after the flag is true. If the application stops before the flag
is true then the RM may retry the application .

{code}
/**
+   * Get the flag which indicates that the application has successfully
+   * unregistered with RM and the application state has been removed from
+   * RMStateStore.
+   */
{code}
Lets not mention internal names like RMStateStore in the javadoc. We can simply say "unregistered
with the RM and the application can safely stop"

Can we create an RMApp method to createYarnApplicationState() (and remove the ServerUtils
method) instead of exposing internal stuff via getPreviousStateAtRemoving()
{code}
+  public static YarnApplicationState createApplicationState(RMApp rmApp) {
+    RMAppState rmAppState = rmApp.getState();
+    // If App is in REMOVING state, return its previous state.
+    if (rmAppState.equals(RMAppState.REMOVING)) {
+      rmAppState = rmApp.getPreviousStateAtRemoving();
{code}

Can we make this a common method instead of duplicating the code
{code}
+      if (!app.isAppRemovalRequestSent) {
+        // application completely done and remove from state store.
+        app.rmContext.getStateStore().removeApplication(app);
+        app.isAppRemovalRequestSent = true;
+      }
{code}

This should be in FinalTransition.transition() because its common to all kinds of terminal
transitions. All terminal transitions including AttemptFinished transition call FinalTransition.transition().
Sorry for not noticing this earlier.
{code}
         app.finishTime = System.currentTimeMillis();
       }
+      if (!app.isAppRemovalRequestSent) {
+        // application completely done and remove from state store.
+        app.rmContext.getStateStore().removeApplication(app);
+        app.isAppRemovalRequestSent = true;
+      }
+
{code}

Isnt testAppRemovingFinishing() already covered by testCreateAppFinishing()?


                
> Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-540
>                 URL: https://issues.apache.org/jira/browse/YARN-540
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-540.10.patch, YARN-540.10.patch, YARN-540.1.patch, YARN-540.2.patch,
YARN-540.3.patch, YARN-540.4.patch, YARN-540.5.patch, YARN-540.6.patch, YARN-540.7.patch,
YARN-540.7.patch, YARN-540.8.patch, YARN-540.9.patch, YARN-540.9.patch, YARN-540.patch, YARN-540.patch
>
>
> When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher
is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload
the existing state files even though the job is succeeded

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message