hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
Date Wed, 08 Apr 2015 16:56:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485527#comment-14485527

Zhijie Shen commented on YARN-3044:

Before screening the patch details, I have some high level comments:

bq. IIUC you meant we will have RMContainerEntity having type as YARN_RM_CONTAINER and NMContainerEntity
having type as YARN_NM_CONTAINER right ?

Can we use ContainerEntity. The events from RM are RM_XXXX_EVENT, and those from NM are NM_XXXX_EVENT.

bq. I'm very much concerned about the volume of writes that the RM collector would need to
bq. I fully understand the concern from Sangjin Lee that RM may not afford tens of thousands
containers in large size cluster.

I also think publishing all container lifecycle events from NM is likely to be a big cost
in total, but I'd like to provide some point from other point of view. Say we have a big cluster
that can afford 5,000 concurrent containers. RM have to maintain the lifecycle of these 5K
containers, and I don't think a less powerful server can manage it, right? Assume we have
such a powerful server to run the RM of a big cluster, will publishing lifecycle events be
a big deal to the server? I'm not sure, but I can provide some hints. Now each container will
write 2 events per lifecycle,  and perhaps in the future we want to record each state transition,
and result in ~10 events per lifecycle. Therefore, we have 10 * 5K  lifecycle events, and
they won't be written at the same moment because containers' lifecycles are usually async.
Let's assume each container run for 1h  and lifecycle events are uniformly distributed, in
each second, there will just be around 14 concurrent writes (for a powerful server).

I think we may overestimate the performance impact of writing NM lifecycles. Perhaps a more
reasonable performance metric is {{cost of writing lifecycle events per container / cost of
managing lifecycle per container * 100%}}. For example, if it is 2%, I guess it will probably
be acceptable.

bq. all configs will not be set as part of this so was there more planned for this from the
framework side or each application needs to take care of this on their own to populate configuration
information ?
bq. In that sense, how about letting frameworks (namely AMs) write the configuration instead
of RM?

I'm not sure if I understand this part correctly, but I incline that system timeline data
(RM/NM) is controlled by cluster config and per cluster, while application data is controlled
by framework or even per-application config. It may have some problem if the user is able
to change the former config. For example, he can hide its application information from cluster

bq. I have also incorporated the changes to support RMContainer metrics based on configuration
(Junping's comments).

Do you mean we should keep {{yarn.resourcemanager.system-metrics-publisher.enabled}} to control
RM SMP, and and create {{yarn.nodemanager.system-metrics-publisher.enabled}} to control NM

> [Event producers] Implement RM writing app lifecycle events to ATS
> ------------------------------------------------------------------
>                 Key: YARN-3044
>                 URL: https://issues.apache.org/jira/browse/YARN-3044
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.

This message was sent by Atlassian JIRA

View raw message