hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
Date Tue, 23 Jun 2015 17:53:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598054#comment-14598054

Sangjin Lee commented on YARN-3045:

The lifecycle management of app collector is a little tricky here: it get registered when
the first container (AM) get launched, but should not unregistered immediately when AM container
get stop. May be wait for application finish event comes to NM should work for most cases.
For corner case that NM publisher delay too long time (queue is busy) to publish event, it
still get chance to fail (very low chance should be acceptable here). Later, we will run to
similar issue again when we are doing app level aggregation in app collector that the aggregation
process could still be running. In any case, we should pay special attention to lifecycle
management for collector - we have a separated JIRA to move it out of auxiliary service. I
think we can discuss more on this together with/in that JIRA.

It's a good point. I think some amount of "linger" after the AM container is completed should
be a fine solution. Note that not only the collector needs to be up but also the mapping should
not be removed from the RM for this to work.

As [~djp] pointed out, having multiple app attempts (AMs) is another case. Perhaps the same
linger can apply in that case so that the collector can stick around to handle some writes
until the next collector that belongs to the next AM comes online and registers itself. We
need to hash out the details of multiple AMs scenario, preferably in a different JIRA.

> [Event producers] Implement NM writing container lifecycle events to ATS
> ------------------------------------------------------------------------
>                 Key: YARN-3045
>                 URL: https://issues.apache.org/jira/browse/YARN-3045
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch,
> Per design in YARN-2928, implement NM writing container lifecycle events and container
system metrics to ATS.

This message was sent by Atlassian JIRA

View raw message