hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lin Yiqun (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
Date Tue, 08 Dec 2015 14:18:11 GMT

     [ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lin Yiqun updated YARN-4381:
----------------------------
    Attachment: YARN-4381.002.patch

Thanks [~djp] for review. I update the container metrics 
more fine-grained. As you said that the container failed is not only because localizationFailed
and is not suitable to add the metric on launchEvent. So I add the metric {{containerLaunchedSuccess}}
when container is becoming to running state and seting the {{wasLaunched=true}}. Besides this,
I add the another two metric2 for container-failed cases.
* one is for containerFailedBeforeLaunched
* other one is for containerKilledAfterLaunched
And I think these metrics will help us to know more concretely of a container.

> Add container launchEvent and container localizeFailed metrics in container
> ---------------------------------------------------------------------------
>
>                 Key: YARN-4381
>                 URL: https://issues.apache.org/jira/browse/YARN-4381
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.7.1
>            Reporter: Lin Yiqun
>            Assignee: Lin Yiqun
>         Attachments: YARN-4381.001.patch, YARN-4381.002.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's {{NodeManagerMetrics#containersLaunched}}
is not actually means the container succeed launched times.Because in some time, it will be
failed when receiving the killing command or happening container-localizationFailed.This will
lead to a failed container.But now,this counter value will be increased in these code whenever
the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
>     Container container =
>         new ContainerImpl(getConfig(), this.dispatcher,
>             context.getNMStateStore(), launchContext,
>           credentials, metrics, containerTokenIdentifier);
>     ApplicationId applicationID =
>         containerId.getApplicationAttemptId().getApplicationId();
>     if (context.getContainers().putIfAbsent(containerId, container) != null) {
>       NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
>         "ContainerManagerImpl", "Container already running on this node!",
>         applicationID, containerId);
>       throw RPCUtil.getRemoteException("Container " + containerIdStr
>           + " already is running on this node!!");
>     }
>     this.readLock.lock();
>     try {
>       if (!serviceStopped) {
>         // Create the application
>         Application application =
>             new ApplicationImpl(dispatcher, user, applicationID, credentials, context);
>         if (null == context.getApplications().putIfAbsent(applicationID,
>           application)) {
>           LOG.info("Creating a new application reference for app " + applicationID);
>           LogAggregationContext logAggregationContext =
>               containerTokenIdentifier.getLogAggregationContext();
>           Map<ApplicationAccessType, String> appAcls =
>               container.getLaunchContext().getApplicationACLs();
>           context.getNMStateStore().storeApplication(applicationID,
>               buildAppProto(applicationID, user, credentials, appAcls,
>                 logAggregationContext));
>           dispatcher.getEventHandler().handle(
>             new ApplicationInitEvent(applicationID, appAcls,
>               logAggregationContext));
>         }
>         this.context.getNMStateStore().storeContainer(containerId, request);
>         dispatcher.getEventHandler().handle(
>           new ApplicationContainerInitEvent(container));
>         this.context.getContainerTokenSecretManager().startContainerSuccessful(
>           containerTokenIdentifier);
>         NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>           "ContainerManageImpl", applicationID, containerId);
>         // TODO launchedContainer misplaced -> doesn't necessarily mean a container
>         // launch. A finished Application will not launch containers.
>         metrics.launchedContainer();
>         metrics.allocateContainer(containerTokenIdentifier.getResource());
>       } else {
>         throw new YarnException(
>             "Container start failed as the NodeManager is " +
>             "in the process of shutting down");
>       }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message