hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.
Date Sun, 29 Mar 2015 23:53:53 GMT

    [ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386025#comment-14386025

Zhijie Shen commented on YARN-3334:

bq. Do we expect some info extra is necessary for ContainerEntity to set? If not, I suspect
some bug (NPE, etc.) could be hidden in putEntity for ContainerEntity.

What's the error message? TestTimelineServiceRecords should have covered ContainerEntity.
If it's some trivial bug, would you mind taking care of it?

1. For debugging purpose, can we have an info record no matter the publishing timeline data
is enabled or not?
211	    if (!publishContainerMetricsToTimelineService) {
212	      LOG.warn("NodeManager has not been configured to publish container " +
213	          "metrics to Timeline Service V2");
214	    }

2. Can we group all the other code related to composing timeline data and put it into this
if block too? Otherwise, if it's false here, that code is not necessary to be executed. And
we should catch the timeline operation's exceptions, as they shouldn't fail the monitoring
608	          if (publishContainerMetricsToTimelineService) {
609	            putEntityWithoutBlocking(timelineClient, entity);
610	          }

3. In addition to remove it from the map, we need to stop the client.
437	      app.context.getTimelineClients().remove(app.getAppId());

4. IMHO, there's a bug here. ResourceTrackerService will the service address of all the active
app. Therefore, according to getTimelineClient logic, it will create clients for each app
then, thought most of them won't be used on this NM. I think, we should decouple timeline
client creation and getting. Creation should happen at the life cycle of ApplicationImpl,
and the client will be put into context.clients. Here we just update the service address if
application's client exists in context.clients.
699	          TimelineClient client = context.getTimelineClient(appId);
700	          client.setTimelineServiceAddress(collectorAddr);

bq. Are there any fundamental challenges to prevent us creating a full timeline series here?
A similar question applies to memory metrics, too.

One time series point per request:
* Pros: Realtime, simple implementation at NM side.
* Cons: Bigger overhead.
Multiple time series points per request:
* Pros: Smaller overhead.
* Cons: Longer latency and cannot support realtime use case (assume a big number of points),
more logic at NM side to buffer  time series data.

Personally, I incline to the the simple approach at this moment.

> [Event Producers] NM TimelineClient life cycle handling and container metrics posting
to new timeline service.
> --------------------------------------------------------------------------------------------------------------
>                 Key: YARN-3334
>                 URL: https://issues.apache.org/jira/browse/YARN-3334
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: YARN-2928
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, YARN-3334-v2.patch, YARN-3334-v3.patch
> After YARN-3039, we have service discovery mechanism to pass app-collector service address
among collectors, NMs and RM. In this JIRA, we will handle service address setting for TimelineClients
in NodeManager, and put container metrics to the backend storage.

This message was sent by Atlassian JIRA

View raw message