hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joep Rottinghuis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
Date Wed, 01 Apr 2015 21:56:53 GMT

    [ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391554#comment-14391554
] 

Joep Rottinghuis commented on YARN-3391:
----------------------------------------

Whether the app_id needs to be part of the default flow name or not seems to boil down how
we think about flows.
Let's say somebody runs the Sleep job, wordcount, TestDFSIO, or an application that doesn't
use MapReduce (where we could default to the app name). For example if somebody runs a Spark
app.

Then are we thinking on the future RM UI, would we show 1 line for each:
{noformat}
Sleep 50 runs cost x 
wordcount 12 runs cost y
TestDFSIO 10 runs cost z
{noformat}

Or would we show one line per run:
{noformat}
Sleep_...1 1 runs cost x/50 
Sleep_...2 1 runs cost x/50 
...
Sleep_...49 1 runs cost x/50 
Sleep_...50 1 runs cost x/50 

wordcount_...1 1 runs cost y/12
wordcount_...2 1 runs cost y/12
wordcount_...3 1 runs cost y/12
...
wordcount_...11 1 runs cost y/12
wordcount_...11 1 runs cost y/12
TestDFSIO_1 1 runs cost z/10
TestDFSIO_2 1 runs cost z/10
TestDFSIO_3 1 runs cost z/10
...
TestDFSIO_9 1 runs cost z/10
TestDFSIO_10 1 runs cost z/10
{noformat}

It would seem that we already have the UI with individual application ids, so users can already
see each individual yarn app that way. We'd also be able to drill into the wordcount flow
name and see 12 runs, each with their unique yarn app id.
Therefore it seems to me that adding the app_id to the flow_id by default does not add any
value, but setting the flow_id to  the app name does add value. We don't want to map it to
a static value as pointed out earlier (we'd see a huge number of runs for a single flow called
"1" or something similar), but forcing every flow to be unique seems to overlap with what
we already have with runs. We'd force each flow to be unique with only 1 run.


> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>
>                 Key: YARN-3391
>                 URL: https://issues.apache.org/jira/browse/YARN-3391
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-3391.1.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to describe the
flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed into the collector
and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message