hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-562) NM should reject containers allocated by previous RM
Date Thu, 25 Apr 2013 02:14:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641308#comment-13641308
] 

Bikas Saha commented on YARN-562:
---------------------------------

Shouldnt the new exception be inheriting from YarnException, the common base class?
I actually like NMNotConnectedWithRMException because NotYetReady could be due to various
other reasons. No strong opinion.
Is there an existing InvalidContainerException for cases when ContainerToken is invalid? How
about InvalidContainerException as a name. If the only thing the client can do is get a new
container from the RM then there may not be any point in differentiating the reasons. If we
really want to keep RM in the name then maybe InvalidContainerFromUnknownRM. Previous may
not be correct.

I think the invalidation need to be done before sending the event because technically this
thread could be suspended immediately after sending the event. So the handler thread could
run before the invalidation happens.
{code}
               dispatcher.getEventHandler().handle(
                   new NodeManagerEvent(NodeManagerEventType.RESYNC));
+              // Invalidate the RMIdentifier while resync
+              setRMIdentifier(ResourceManagerConstants.RM_INVALID_IDENTIFIER);
               break;
{code}

Reads weird that container manager is notifying itself.
{code}
+
+    LOG.info("Notifying ContainerManager to block new container-requests as " +
+    		"NodeManager is still starting.");
+    this.setBlockNewContainerRequests(true);
{code}

Would be good to continue looping until notified that the containermanager is no longer blocked.
{code}
+            try {
               // HERE set FLAG to stop thread
+              launchContainersThread.join();
+              super.setBlockNewContainerRequests(blockNewContainerRequests);
....
+        try {
           // HERE check FLAG to stop thread
+          while (numContainers++ < 10) {
{code}
                
> NM should reject containers allocated by previous RM
> ----------------------------------------------------
>
>                 Key: YARN-562
>                 URL: https://issues.apache.org/jira/browse/YARN-562
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-562.10.patch, YARN-562.1.patch, YARN-562.2.patch, YARN-562.3.patch,
YARN-562.4.patch, YARN-562.5.patch, YARN-562.6.patch, YARN-562.7.patch, YARN-562.8.patch,
YARN-562.9.patch
>
>
> Its possible that after RM shutdown, before AM goes down,AM still call startContainer
on NM with containers allocated by previous RM. When RM comes back, NM doesn't know whether
this container launch request comes from previous RM or the current RM. we should reject containers
allocated by previous RM 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message