Return-Path: X-Original-To: apmail-falcon-dev-archive@minotaur.apache.org Delivered-To: apmail-falcon-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFB21CF26 for ; Sun, 14 Jul 2013 16:25:13 +0000 (UTC) Received: (qmail 64322 invoked by uid 500); 14 Jul 2013 16:25:13 -0000 Delivered-To: apmail-falcon-dev-archive@falcon.apache.org Received: (qmail 64253 invoked by uid 500); 14 Jul 2013 16:25:12 -0000 Mailing-List: contact dev-help@falcon.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@falcon.incubator.apache.org Delivered-To: mailing list dev@falcon.incubator.apache.org Received: (qmail 64237 invoked by uid 99); 14 Jul 2013 16:25:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Jul 2013 16:25:11 +0000 X-ASF-Spam-Status: No, hits=-2000.4 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 14 Jul 2013 16:25:10 +0000 Received: (qmail 64205 invoked by uid 99); 14 Jul 2013 16:24:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Jul 2013 16:24:49 +0000 Date: Sun, 14 Jul 2013 16:24:49 +0000 (UTC) From: "Srikanth Sundarrajan (JIRA)" To: dev@falcon.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (FALCON-48) Pipeline entity for Falcon MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/FALCON-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708063#comment-13708063 ] Srikanth Sundarrajan edited comment on FALCON-48 at 7/14/13 4:24 PM: --------------------------------------------------------------------- Yes the ask makes sense. Looks like a common use case. Can we define a pipeline as the flow between two feed instance. Consider for example: Two source feeds A & B defined in clusters X1 & X2 each and then are transformed (process P1) to A1 & B1 respectively again in both the clusters X1 & X2. Now consider another process (P2) which consumes A1 & B1 in each of the clusters and produce feed C1 in each of the clusters. And let us say C1 is replicated (multiple source, single target) from X1 & X2 to X3. In this case the dependency graph would look something like (Refer FALCON-37) {noformat} X1 X2 ============== ============== A B A B | | | | | | | | A1 B1 A1 B1 | | | | | | | | || || C1 C1 | | | | | | || C1 =============== X3 {noformat} In the above flow pipeline can be defined as any pair of source feed & target feed combination. Using a few notations to represent this: {noformat} * - indicates "All clusters" #local - indicates the specific cluster on which the source originated {noformat} Possible pipeline abstractions (as long as there is a path in the graph between source & targets): {noformat} 1. (A@*,B@*) - (C@X3) 2. (A@X1) - (A1@#local) 3. (A@*) - (C@#local) {noformat} Comments welcome. was (Author: sriksun): Yes makes sense. Does it make sense to define a pipeline as the flow between two feed instance. Consider for example: Two source feeds A & B defined in clusters X1 & X2 each and then are transformed (process P1) to A1 & B1 respectively again in both the clusters X1 & X2. Now consider another process (P2) which consumes A1 & B1 in each of the clusters and produce feed C1 in each of the clusters. And let us say C1 is replicated (multiple source, single target) from X1 & X2 to X3. In this case the dependency graph would look something like (Refer FALCON-37) {noformat} X1 X2 ============== ============== A B A B | | | | | | | | A1 B1 A1 B1 | | | | | | | | || || C1 C1 | | | | | | || C1 =============== X3 {noformat} In the above flow pipeline can be defined as any pair of source feed & target feed combination. Using a few notations to represent this: * - indicates "All clusters" #local - indicates the specific cluster on which the source originated Possible pipeline abstractions (as long as there is a path in the graph between source & targets): 1. (A@*,B@*) - C@X3 2. (A@X1) - (A1@#local) 3. (A@*) - (C@#local) Comments welcome. > Pipeline entity for Falcon > -------------------------- > > Key: FALCON-48 > URL: https://issues.apache.org/jira/browse/FALCON-48 > Project: Falcon > Issue Type: Wish > Components: general > Reporter: Sanjeev T > Priority: Minor > Labels: operability > > Falcon should also have pipeline entity. > * Pipeline entity,can comprise of the complete DAG for given set of process and feeds, within cluster or across clusters. > * How this helps, > * setting up a pipeline, should take care of relevant feeds and process > to be submitted. > * in case of cluster having issue, a particular pipeline can be processed > on another cluster > * to build monitoring system for a pipeline system > * run a particular pipeline for given time-window > * cases like, backlog and catch can be handled easily > * for Pipeline(A) to complete, we can suspend Pipeline(B), > if they have dependency -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira