hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
Date Thu, 02 Apr 2015 00:39:54 GMT

    [ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391864#comment-14391864
] 

Zhijie Shen edited comment on YARN-3391 at 4/2/15 12:39 AM:
------------------------------------------------------------

Sangjin, thanks for your comments, too. According to your and Joep's comments, I can see the
benefit to show application aggregation information by application (type). However, IMHO,
it's orthogonal to flow definition. Isn't the straightforward approach to provide this feature
via aggregating on application name/type dimension instead of let flow name = application
name.

On the other side, flow should semantically stand for *workflow* (correct me if I'm wrong
about flow concept), which contains a group of applications that work together to resolve
a problem. Making flow name == application name changes the semantics That said, a flow of
applications means the applications of the same type.

{quote}
 If a user is running TestDFSIO over and over, they should be recognized as different instances
of the same thing.
{quote}

I guess the "same thing" you had in mind is not the same workflow, but the same application
type, right? And back to Joep's web UI example, it's better to be described as "getting sum(cost)
from apps where app_name(type) = sleep". Therefore, how about we decoupling the two concepts?
One step back, when users set the flow explicitly, are they going to tell the application
that it belongs to workflow ABC, or that it belongs to job type XYZ? I think it will be the
former.


was (Author: zjshen):
Sangjin, thanks for your comments, too. According to your and Joep's comments, I can see the
benefit to show application aggregation information by application (type). However, IMHO,
it's orthogonal to flow definition. Isn't the straightforward approach to provide this feature
via aggregating on application name/type dimension instead of let flow name = application
name.

On the other side, flow should semantically stand for *workflow* (correct me if I'm wrong
about flow concept), which contains a group of applications that work together to resolve
a problem. Making flow name == application name changes the semantics That said, a flow of
applications means the applications of the same type.

{quote}
 If a user is running TestDFSIO over and over, they should be recognized as different instances
of the same thing.
{quote}

I guess the "same thing" you had in mind is not the same workflow, but the same application
type, right? How about we decoupling the two concepts? One step back, when users set the flow
explicitly, are they going to tell the application that you belong to workflow abc, or that
you belong to job type xyz? I think it will be the former.

> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>
>                 Key: YARN-3391
>                 URL: https://issues.apache.org/jira/browse/YARN-3391
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-3391.1.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to describe the
flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed into the collector
and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message