hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts
Date Mon, 14 Jul 2014 22:56:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061387#comment-14061387
] 

Karthik Kambatla commented on YARN-2244:
----------------------------------------

# Can we use {{AbstractYarnScheduler#killOrphanContainerOnNode()}} instead? 
{code}
+      this.rmContext.getDispatcher().getEventHandler()
+          .handle(new RMNodeCleanContainerEvent(node.getNodeID(), containerId));
{code}
# Thanks for moving the following to a separate method. IMO, we should clean it up more:
{code}
  protected void waitForContainerCleanup(DrainDispatcher dispatcher, MockNM nm,
                                         NodeHeartbeatResponse resp) throws Exception {
    int waitCount;
    dispatcher.await();
    List<ContainerId> contsToClean = resp.getContainersToCleanup();
    int cleanedConts = contsToClean.size();
    waitCount = 0;
    while (cleanedConts < 1 && waitCount++ < 200) {
      LOG.info("Waiting to get cleanup events.. cleanedConts: " + cleanedConts);
      Thread.sleep(100);
      resp = nm.nodeHeartbeat(true);
      dispatcher.await();
      contsToClean = resp.getContainersToCleanup();
      cleanedConts += contsToClean.size();
    }
    if (contsToClean.isEmpty()) {
      LOG.error("Failed to get any containers to cleanup");
    } else {
      LOG.info("Got cleanup for " + contsToClean.get(0));
    }
    Assert.assertEquals(1, cleanedConts);
  }
{code}
## One line over 80 chars
## {{int waitCount = 0}} can go on oneline
## Fetching containers to clean and other arithmetic before the while loop can be moved into
the while loop. cleanedConts can be initialized to zero. I am okay with a do-while too. 
## Remove the logging - I am not sure why are we logging that information 200 times.
## Parametrize the method to also take number of container cleanups to wait for and use it
everywhere. 

> FairScheduler missing handling of containers for unknown application attempts 
> ------------------------------------------------------------------------------
>
>                 Key: YARN-2244
>                 URL: https://issues.apache.org/jira/browse/YARN-2244
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>            Reporter: Anubhav Dhoot
>            Assignee: Anubhav Dhoot
>            Priority: Critical
>         Attachments: YARN-2224.patch, YARN-2244.001.patch, YARN-2244.002.patch
>
>
> We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other fixes that
were common across schedulers, there were some scheduler specific fixes added to handle containers
for unknown application attempts. Without these fair scheduler simply logs that an unknown
container was found and continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message