hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
Date Sat, 21 Mar 2015 01:15:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372412#comment-14372412

Sangjin Lee commented on YARN-3040:

[~zjshen], thanks for your updated patch and prompt answers! I'll go over the new patch in
some more detail, and get back to you. I haven't looked at the patch just yet, and therefore
I might be saying something dumb, but I thought I'd reply to some of your points. Hopefully
this will move things forward.

bq. RM will have all the above context info. When constructing and starting RM collector,
we should make sure it be setup.
Since RM's collector will handle multiple applications, there is no one-to-one relationship
between flow/flow-run/app and an instance of the RM collector. RM will just have to retain
that information in memory for multiple apps, and pass that along on a per-call basis to the

bq. Personally, I prefer to user ID to be uniform among the all the context properties. ID
indicates it can be used to identify a flow.
I'm OK with "flow id" if it increases consistency.

bq. I thought version is part of flow id. I think we can revisit it once the schema is done,
and we finalized the generic description about the flow structure and the notation. So far
I'd like to keep it as what it is now. Thoughts?
Hmm, I didn't think the version as part of the flow id. Here we're thinking bit ahead to the
storage and query aspects of it, but it's perfectly feasible to ask questions like "give me
the latest 10 runs of the flow named 'foo.pig'". Note that those latest 10 runs can have different
versions. This implies there needs to be a semantic differentiation between the flow id (name)
and the flow version. Namely, in this query the flow version is *not* used to retrieve the
last 10 runs. So I would advocate having a separate field/attribute named "flow version" from
"flow id".

As for the run id being numeric, as Li alluded to it, there is a significant advantage in
having run id's as numbers (longs really) as it lends itself to super-easy sorting. It's a
little bit of storage concern leaking to the higher level abstraction, but it's a strong reason
to qualify it as a number IMO.

bq. It makes sense, but when RM restarts we use the new start time of RM to identify the app
instead of the one before. In current way, cluster_xyz will contain the application_xyz_123.
This was my rationale before. And this default cluster id construction is only used in the
case the user didn't specify the cluster id in config file. In production, user should specify
one. I'll thought about the question again.
I'm still not sure why it would make sense to have different logical cluster id's every time
the RM/cluster restarts. Logically, a single cluster should be identified by a long-lived
name. For example, UIs will be built on questions like "give me top 10 flows on cluster ABC".
Queries like that surely wouldn't care about cluster restarts.

As for the default value, in fact I would imagine most use cases would not set the cluster
id (just assuming the cluster default would be filled in). That would be the norm, not the

Hope these help...

> [Data Model] Make putEntities operation be aware of the app's context
> ---------------------------------------------------------------------
>                 Key: YARN-3040
>                 URL: https://issues.apache.org/jira/browse/YARN-3040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Zhijie Shen
>         Attachments: YARN-3040.1.patch, YARN-3040.2.patch
> Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should
be able to define and pass in all attributes of flows and flow runs to YARN, and they should
be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.

This message was sent by Atlassian JIRA

View raw message