hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.
Date Sun, 29 Mar 2015 00:10:53 GMT

    [ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385577#comment-14385577
] 

Li Lu commented on YARN-3334:
-----------------------------

Hi [~djp], thanks for working on this! I checked the patch and it generally LGTM. Here are
a few comments and questions:

{code}
+    threadPool.shutdown();
+    
+    while (!threadPool.isTerminated()) { // wait for all posting thread to finish
+      try {
+        if (!threadPool.awaitTermination(30, TimeUnit.SECONDS)) {
+          threadPool.shutdownNow(); // send interrupt to hurry them along
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("Timeline client service stop interrupted!");
+        break;
+      }
+    }
{code}
If I understand this correctly, we're granting non-terminating threadPools 30 seconds to wait
them finish, and shutdown the thread pool afterwards. I'm not sure if we need the big while
loop here? This design would work but looks a little bit weird compare to the sample two-phase
shutdown discussed in Oracle's official document (http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html).
Could you please double check if the sample solution there works for us? 

{code}
+          TimelineEntity entity = new TimelineEntity();
+          entity.setId(containerId.toString());
+          entity.setType(TimelineEntityType.YARN_CONTAINER.toString());
{code}
ContainerEntity is a HierarchicalTimelineEntity, which carries parent-children relationships
for timeline entities. I suspect we have some bugs with putting HierarchicalTimelineEntity,
but it's only my hunch. If you have time, it would be great if you could look into it a little
bit (and fix them separately), but if you have some bandwidth limitations, I can take care
of them when testing the storage layer. Asking for [~zjshen]'s confirmation here. 

{code}
+            // TODO move timeline metrics handling to a separate method
+            TimelineMetric cpuMetric = new TimelineMetric();
+            cpuMetric.setId(ContainerMetric.CPU.toString() + pId);
+            cpuMetric.addTimeSeriesData(currentTime, 
+                cpuUsageTotalCoresPercentage);
{code}
Are we just adding one time series point to each new TimelineMetric, then add them to the
TimelineEntity? This sounds not quite economical because we're creating redundant TimelineMetrics
for the data points in the same TimelineSeries. Are there any fundamental challenges to prevent
us creating a full timeline series here? A similar question applies to memory metrics, too.


Thanks! 

> [Event Producers] NM TimelineClient life cycle handling and container metrics posting
to new timeline service.
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3334
>                 URL: https://issues.apache.org/jira/browse/YARN-3334
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: YARN-2928
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, YARN-3334-v2.patch, YARN-3334-v3.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector service address
among collectors, NMs and RM. In this JIRA, we will handle service address setting for TimelineClients
in NodeManager, and put container metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message