Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 24FC01733A for ; Wed, 1 Apr 2015 19:38:59 +0000 (UTC) Received: (qmail 27035 invoked by uid 500); 1 Apr 2015 19:38:54 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 26974 invoked by uid 500); 1 Apr 2015 19:38:54 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 26962 invoked by uid 99); 1 Apr 2015 19:38:53 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Apr 2015 19:38:53 +0000 Date: Wed, 1 Apr 2015 19:38:53 +0000 (UTC) From: "Vrushali C (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391296#comment-14391296 ] Vrushali C commented on YARN-3391: ---------------------------------- I have some semantic level comments. 1) bq. public static String generateDefaultFlowIdBasedOnAppId(ApplicationId appId) { return "flow_" + appId.getClusterTimestamp() + "_" + appId.getId(); would be nice to have this string as a static final somewhere. Also the separator defined as a static final string. 2) I see that flowRun means flowRunId in this code now. I would actually keep it as flowRunId. Because an api call like getFlowRun() to me seems that it should return the flow run details, not just the flow run id. 3) Reposting an earlier reply since jira seems to align it earlier in the thread. bq. Otherwise, if we use the job name, for example, all the wordcout jobs will belong to one flow then by default. Yes, that's exactly what they are. All wordcount jobs belong to the same flow "wordcount" by that user and each run of the word count is a flow run. In fact, they should not end up being separate flows. > Clearly define flow ID/ flow run / flow version in API and storage > ------------------------------------------------------------------ > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Zhijie Shen > Assignee: Zhijie Shen > Attachments: YARN-3391.1.patch > > > To continue the discussion in YARN-3040, let's figure out the best way to describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not set it) > - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)