hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
Date Wed, 01 Apr 2015 21:52:53 GMT

    [ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391549#comment-14391549
] 

Zhijie Shen commented on YARN-3391:
-----------------------------------

I have offline discussion with Vrushali. Here's some summary:

1. We agree that by default, each individual application should belongs to each individual
flow run. 

2. While Vrushali thought different applications of the same name should belong to the same
flow (name), I prefer each individual application should belong to different flow (name).

My opinion is that each individual application should be completely separated at different
flow notation levels unless users specify name/version/run explicitly to minimize the interaction
with other applications. For example, the aggregation about this application won't affect
others and wont be affected by others.

And one technical problem about using application name is that it's "N/A" by default, unless
users set it explicitly in the framework code. Similarly, the other field that we could choose
for flow name is application type, which is "YARN" by default. Therefore, either using name
or type will potentially result in most of users' applications in the flow (name) "N/A"/"YARN".

However, the more essential question is if it makes sense to group the applications by application
name/type by default at the flow (name) level, and if the flow-level aggregation info makes
sense for this default grouping (e.g. all wordcount  jobs of zjshen). [~sjlee0] and [~vinodkv],
any comments?




> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>
>                 Key: YARN-3391
>                 URL: https://issues.apache.org/jira/browse/YARN-3391
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-3391.1.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to describe the
flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed into the collector
and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message