hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
Date Sat, 21 Mar 2015 00:34:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372384#comment-14372384

Li Lu commented on YARN-3040:

Hi [~zjshen], some quick thoughts...

bq. It sounds like each NM will need to have multiple timeline clients (one for each application).
bq. That's correct.
bq. The RM will have its own collector, and it does not go through the TimelineClient API.
How would that work?
bq. RM will have all the above context info. When constructing and starting RM collector,
we should make sure it be setup.

For both RM and NMs, they are posting predefined "application history info", but not "generic"
(I'm trying to use the wording in ATS v1 but correct me if I'm wrong.). I'm thinking the if
it's possible to have another client implement, based on our existing implement, that can
handle multiple applications within the same client? It sounds not quite scalable if we have
one client for each app in the RM...

bq. I thought version is part of flow id. I think we can revisit it once the schema is done,
and we finalized the generic description about the flow structure and the notation. So far
I'd like to keep it as what it is now. Thoughts?

One most significant advantage to have run ids as integers is we can easily sort all existing
runs for one flow in ascending or descending order. This might be a solid use case in general?

bq. It makes sense, but when RM restarts we use the new start time of RM to identify the app
instead of the one before. In current way, cluster_xyz will contain the application_xyz_123.
This was my rationale before. And this default cluster id construction is only used in the
case the user didn't specify the cluster id in config file. In production, user should specify
one. I'll thought about the question again.

Mostly fine, but I have some concerns about rolling upgrades. With rolling upgrades, if we're
not specifying cluster ids explicitly, applications that live across an upgrade will have
two different primary keys. Even though we may merge this in our reader (which still sounds
suboptimal), this may pose a challenge to our aggregators (data will be aggregated to two
different entities across time). Any suggestions on this? 

> [Data Model] Make putEntities operation be aware of the app's context
> ---------------------------------------------------------------------
>                 Key: YARN-3040
>                 URL: https://issues.apache.org/jira/browse/YARN-3040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Zhijie Shen
>         Attachments: YARN-3040.1.patch, YARN-3040.2.patch
> Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should
be able to define and pass in all attributes of flows and flow runs to YARN, and they should
be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.

This message was sent by Atlassian JIRA

View raw message