hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
Date Mon, 23 Mar 2015 17:55:53 GMT

    [ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376285#comment-14376285

Sangjin Lee commented on YARN-3040:

I can understand this particular case described above. Like my prior comment about flow run
ID, my concern is whether flow/version/run's explicit hierarchy is so general to capture most
use cases. IMHO, by nature, the hierarchy is the tree of flows, and a flow can be the flow
of flows or the flow of apps. However, if other users just want to use one level of flow,
version/run info seems to be redundant. On the other side, if use the flow recursion structure,
it's elastic to have flow levels from one to many. We can treat the first level as the flow,
the second as version and third and run. I don't have expertise knowledge about workflow such
as Oozie, but just want to think out my concern loudly. That said, if flow/version/run is
the general description of a flow, I agree we should pass in these three env vars together
and separately.

Agreed that we need to consider both use cases (single level and multi-level). I just want
to clarify that even with one level of flows, it is possible (and in fact it is more common)
that there are multiple runs for a given flow version, and multiple version for a given flow
name; e.g. "foo.pig"/"v.1"/1, "foo.pig"/"v.1"/2, ..., "foo.pig"/"v.2"/10, "foo.pig"/"v.2"/11,

Also, my mental model is that flow id/version/run-id is not a hierarchy. It's just a group
of 3 attributes (although there is some implied contains relationship).

Also, when we store these 3 attributes in the storage, I suspect schemas like HBase/phoenix
will probably make only the flow id (name) and the flow run id as part of the primary/row
key, and store the flow version in a separate table.

> [Data Model] Make putEntities operation be aware of the app's context
> ---------------------------------------------------------------------
>                 Key: YARN-3040
>                 URL: https://issues.apache.org/jira/browse/YARN-3040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Zhijie Shen
>         Attachments: YARN-3040.1.patch, YARN-3040.2.patch
> Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should
be able to define and pass in all attributes of flows and flow runs to YARN, and they should
be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.

This message was sent by Atlassian JIRA

View raw message