hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
Date Mon, 23 Mar 2015 16:45:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376169#comment-14376169

Zhijie Shen commented on YARN-3040:

bq.  It sounds not quite scalable if we have one client for each app in the RM...

In RM/NM, I think we can and we should implement a wrapper layer, which may contain multiple
applications, to have delegator  to write the data for multiple applications.

bq. One most significant advantage to have run ids as integers is we can easily sort all existing
runs for one flow in ascending or descending order. This might be a solid use case in general?

I can see the benefit. For example, if it represents the timestamp, we can filter the flow
runs and say give me the runs in the last 5 mins. But my concern is whether it's the general
way to let user to describe a run.

bq. Hmm, I didn't think the version as part of the flow id.

I can understand this particular case described above. Like my prior comment about flow run
ID, my concern is whether flow/version/run's explicit hierarchy is so general to capture most
use cases. IMHO, by nature, the hierarchy is the tree of flows, and a flow can be the flow
of flows or the flow of apps. However, if other users just want to use one level of flow,
version/run info seems to be redundant. On the other side, if use the flow recursion structure,
it's elastic to have flow levels from one to many. We can treat the first level as the flow,
the second as version and third and run. I don't have expertise knowledge about workflow such
as Oozie, but just want to think out my concern loudly. That said, if flow/version/run is
the general description of a flow, I agree we should pass in these three env vars together
and separately.

bq. Mostly fine, but I have some concerns about rolling upgrades.
bq. I'm still not sure why it would make sense to have different logical cluster id's every
time the RM/cluster restarts. 

I meant the admin can configure a cluster ID explicitly, which won't be appended with the
timestamp. I added it for the default value to distinguish the clusters that are started by
you and me, but I think about it again, and it seems that RM restarting problem makes sense.
I'll change the default not to append timestamp.

> [Data Model] Make putEntities operation be aware of the app's context
> ---------------------------------------------------------------------
>                 Key: YARN-3040
>                 URL: https://issues.apache.org/jira/browse/YARN-3040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Zhijie Shen
>         Attachments: YARN-3040.1.patch, YARN-3040.2.patch
> Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should
be able to define and pass in all attributes of flows and flow runs to YARN, and they should
be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.

This message was sent by Atlassian JIRA

View raw message