Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6120B10B14 for ; Sun, 29 Mar 2015 23:53:53 +0000 (UTC) Received: (qmail 91156 invoked by uid 500); 29 Mar 2015 23:53:53 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 91111 invoked by uid 500); 29 Mar 2015 23:53:53 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 91100 invoked by uid 99); 29 Mar 2015 23:53:53 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Mar 2015 23:53:53 +0000 Date: Sun, 29 Mar 2015 23:53:53 +0000 (UTC) From: "Zhijie Shen (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386025#comment-14386025 ] Zhijie Shen commented on YARN-3334: ----------------------------------- bq. Do we expect some info extra is necessary for ContainerEntity to set? If not, I suspect some bug (NPE, etc.) could be hidden in putEntity for ContainerEntity. What's the error message? TestTimelineServiceRecords should have covered ContainerEntity. If it's some trivial bug, would you mind taking care of it? 1. For debugging purpose, can we have an info record no matter the publishing timeline data is enabled or not? {code} 211 if (!publishContainerMetricsToTimelineService) { 212 LOG.warn("NodeManager has not been configured to publish container " + 213 "metrics to Timeline Service V2"); 214 } {code} 2. Can we group all the other code related to composing timeline data and put it into this if block too? Otherwise, if it's false here, that code is not necessary to be executed. And we should catch the timeline operation's exceptions, as they shouldn't fail the monitoring service. {code} 608 if (publishContainerMetricsToTimelineService) { 609 putEntityWithoutBlocking(timelineClient, entity); 610 } {code} 3. In addition to remove it from the map, we need to stop the client. {code} 437 app.context.getTimelineClients().remove(app.getAppId()); {code} 4. IMHO, there's a bug here. ResourceTrackerService will the service address of all the active app. Therefore, according to getTimelineClient logic, it will create clients for each app then, thought most of them won't be used on this NM. I think, we should decouple timeline client creation and getting. Creation should happen at the life cycle of ApplicationImpl, and the client will be put into context.clients. Here we just update the service address if application's client exists in context.clients. {code} 699 TimelineClient client = context.getTimelineClient(appId); 700 client.setTimelineServiceAddress(collectorAddr); {code} bq. Are there any fundamental challenges to prevent us creating a full timeline series here? A similar question applies to memory metrics, too. One time series point per request: * Pros: Realtime, simple implementation at NM side. * Cons: Bigger overhead. Multiple time series points per request: * Pros: Smaller overhead. * Cons: Longer latency and cannot support realtime use case (assume a big number of points), more logic at NM side to buffer time series data. Personally, I incline to the the simple approach at this moment. > [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service. > -------------------------------------------------------------------------------------------------------------- > > Key: YARN-3334 > URL: https://issues.apache.org/jira/browse/YARN-3334 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Affects Versions: YARN-2928 > Reporter: Junping Du > Assignee: Junping Du > Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, YARN-3334-v2.patch, YARN-3334-v3.patch > > > After YARN-3039, we have service discovery mechanism to pass app-collector service address among collectors, NMs and RM. In this JIRA, we will handle service address setting for TimelineClients in NodeManager, and put container metrics to the backend storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)