hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions
Date Mon, 28 Jul 2014 18:08:44 GMT

    [ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076498#comment-14076498

Zhijie Shen commented on YARN-2209:

[~jianhe], thanks for the patch. Bellow is some meta comments on this issue.

Why is it necessary to use the exception instead of the flag to indicate the RM restarting?
In general, I'm afraid the changes here mutually break the backward compatibility between
YARN and MR. On the one side, any YARN applications used to have the logic to deal with RM
restarting need to be updated after this patch. For example, MR of prior versions will no
longer work properly with a YARN cluster after this patch during RM restarting. The MR job
won’t recognize the not found exception and take the necessary restarting treatment, but
will just record the error and move on.

On the other side, if we assume it is possible the new version MR job after this patch is
going to be run on an old YARN cluster, the MR job will then not recognize the old flag-style
restarting signal, and thus will not executing the MR-side logic to deal with RM restarting.
IMHO, at least, the switch block to check the AMCommand cannot be removed but deprecated for
compatibility consideration.

In case we want to proceed with this change, here're some comment on the patch:

1.  MR side change is not trivial. According to our convention before, shall we split the
patch into two pieces: one for YARN and the other for MR, such that we can easily track the
changes for different projects.

2. Why not throwing ApplicationAttemptNotFoundException instead? It sounds more reasonable
here, doesn’t it?

3. Deprecate the enum type instead of each enum value?
 public enum AMCommand {

4. The description sounds not accurate enough. It doesn’t just request containers. “App
Master heartbeat”?
+    public static final String AM_ALLOCATE = "App Master request containers”;

5. No need to break it into two lines, right?
     AllocateResponse allocateResponse;
+    allocateResponse = scheduler.allocate(allocateRequest);

6.  Is this change necessary?
-        return allocate(progressIndicator);
+        allocateResponse = allocate(progressIndicator);
+        return allocateResponse;

> Replace AM resync/shutdown command with corresponding exceptions
> ----------------------------------------------------------------
>                 Key: YARN-2209
>                 URL: https://issues.apache.org/jira/browse/YARN-2209
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, YARN-2209.4.patch,
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate application
to re-register on RM restart. we should do the same for AMS#allocate call also.

This message was sent by Atlassian JIRA

View raw message