hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
Date Mon, 30 Mar 2015 21:52:53 GMT

    [ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387491#comment-14387491

Vrushali C commented on YARN-3391:

bq. I propose:
flow name: String: default(cluster_<appId without "app" prefix>)

AppId is some string like application_<epoch_timestamp>_<some_number> . I don't
think using just the numerical part without the "app_" prefix will be easy to relate to. Actually,
what would be easy for the user to relate to is something like (in case of hadoop jobs) mapreduce.job.name
param from the config. [~zjshen] do you know of any such config or context parameter that
can be set so that we can pick up the flow name from there for all yarn applications?

bq. flow version: String: default("1")
default string of "1" is fine. 

bq. flow run: long: default(1)

Using a run id of 1 will mean everything will fall into this bucket if no one sets the run
id. There needs to be a way to ensure the run id is set or if not, the default needs to be
something variable like submit_time. Else we would have a poblem with having a default run
id of 1. For example, if I run a sleep job 10 times and I don't set the run id, then information
of each run is overwritten (since all of them will have run id of 1).

> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>                 Key: YARN-3391
>                 URL: https://issues.apache.org/jira/browse/YARN-3391
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
> To continue the discussion in YARN-3040, let's figure out the best way to describe the
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed into the collector
and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not set it)
> - How do we handle flow attributes in case of nested levels of flows?

This message was sent by Atlassian JIRA

View raw message