hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
Date Wed, 03 Jun 2015 22:09:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571732#comment-14571732
] 

Zhijie Shen commented on YARN-3044:
-----------------------------------

[~Naganarasimha], thanks for updating the patch. It looks good to me so far, but I want to
hold the patch for the following issues.

1. After YARN-3276 is committed, this patch will conflict on {{return l2.compareTo(l1);}}.

2. We're reworking YARN-1462. It won't affect this patch, but there's commit revert. Let's
wait until YARN-1462 is done.

3. It not caused by this patch, but I found a race condition of publishing app finish event:
{code}
15/06/03 14:59:56 INFO rmapp.RMAppImpl: application_1433367826630_0002 State change from FINISHING
to FINISHED
15/06/03 14:59:56 INFO capacity.LeafQueue: completedContainer container=Container: [ContainerId:
container_1433367826630_0002_01_000001, NodeId: localhost:9105, NodeHttpAddress: localhost:8042,
Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken,
service: 127.0.0.1:9105 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0,
vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:8192,
vCores:8>
15/06/03 14:59:56 INFO resourcemanager.RMAuditLogger: USER=zshen	OPERATION=Application Finished
- Succeeded	TARGET=RMAppManager	RESULT=SUCCESS	APPID=application_1433367826630_0002
15/06/03 14:59:56 INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=0.0
absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:8192, vCores:8>
15/06/03 14:59:56 ERROR metrics.TimelineServiceV2Publisher: Error when publishing entity TimelineEntity[type='YARN_APPLICATION',
id='application_1433367826630_0002']
java.lang.NullPointerException
	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:273)
	at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.publishApplicationFinishedEvent(TimelineServiceV2Publisher.java:133)
	at org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(AbstractTimelineServicePublisher.java:70)
	at org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(AbstractTimelineServicePublisher.java:35)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
	at java.lang.Thread.run(Thread.java:745)
15/06/03 14:59:56 INFO amlauncher.AMLauncher: Cleaning master appattempt_1433367826630_0002_000001
{code}

I think the problem is we stop the timeline collector immediately after calling appFinished,
which is an async call, and publishing operation is executed asynchronously on another thread.
One option is to stopTimelineCollector after publishing finish event in publisher. Can you
take care of it?
{code}
      app.rmContext.getSystemMetricsPublisher()
          .appFinished(app, finalState, app.finishTime);

      app.stopTimelineCollector();
{code}

> [Event producers] Implement RM writing app lifecycle events to ATS
> ------------------------------------------------------------------
>
>                 Key: YARN-3044
>                 URL: https://issues.apache.org/jira/browse/YARN-3044
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3044-YARN-2928.004.patch, YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch,
YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, YARN-3044-YARN-2928.009.patch,
YARN-3044-YARN-2928.010.patch, YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message