Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Wed, 1 Apr 2015 19:38:53 +0000 (UTC)
From: "Vrushali C (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12784945.1427136557000.97121.1427917133904@Atlassian.JIRA>
In-Reply-To: <JIRA.12784945.1427136557000@Atlassian.JIRA>
References: <JIRA.12784945.1427136557000@Atlassian.JIRA>
 <JIRA.12784945.1427136557128@arcas>
Subject: [jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run /
 flow version in API and storage
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391296#comment-14391296 ] 

Vrushali C commented on YARN-3391:
----------------------------------

I have some semantic level comments.
1) bq.  public static String generateDefaultFlowIdBasedOnAppId(ApplicationId appId) {
return "flow_" + appId.getClusterTimestamp() + "_" + appId.getId();

would be nice to have this string as a static final somewhere. Also the separator defined as a static final string. 

2) I see that flowRun means flowRunId in this code now. I would actually keep it as flowRunId. Because an api call like getFlowRun() to me seems that it should return the flow run details, not just the flow run id.

3) Reposting an earlier reply since jira seems to align it earlier in the thread. bq. Otherwise, if we use the job name, for example, all the wordcout jobs will belong to one flow then by default.

Yes, that's exactly what they are. All wordcount jobs belong to the same flow "wordcount" by that user and each run of the word count is a flow run. In fact, they should not end up being separate flows. 


> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>
>                 Key: YARN-3391
>                 URL: https://issues.apache.org/jira/browse/YARN-3391
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-3391.1.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not set it)
> - How do we handle flow attributes in case of nested levels of flows?


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)