Return-Path: X-Original-To: apmail-falcon-dev-archive@minotaur.apache.org Delivered-To: apmail-falcon-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 59CB9CA20 for ; Mon, 22 Dec 2014 04:07:04 +0000 (UTC) Received: (qmail 48159 invoked by uid 500); 22 Dec 2014 04:07:04 -0000 Delivered-To: apmail-falcon-dev-archive@falcon.apache.org Received: (qmail 48110 invoked by uid 500); 22 Dec 2014 04:07:04 -0000 Mailing-List: contact dev-help@falcon.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@falcon.incubator.apache.org Delivered-To: mailing list dev@falcon.incubator.apache.org Received: (qmail 48099 invoked by uid 99); 22 Dec 2014 04:07:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Dec 2014 04:07:03 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sriksun@hotmail.com designates 65.55.116.103 as permitted sender) Received: from [65.55.116.103] (HELO BLU004-OMC3S28.hotmail.com) (65.55.116.103) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Dec 2014 04:06:35 +0000 Received: from BLU179-W27 ([65.55.116.74]) by BLU004-OMC3S28.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.22751); Sun, 21 Dec 2014 20:05:16 -0800 X-TMN: [ioEy0DttAPyim9nteuG30PFYXjM9863c] X-Originating-Email: [sriksun@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_cad502ac-fc0c-4f67-8751-8d88bb0e47de_" From: Srikanth Sundarrajan To: "dev@falcon.incubator.apache.org" Subject: RE: [DISCUSS] Orchestration in Falcon Date: Mon, 22 Dec 2014 09:35:16 +0530 Importance: Normal In-Reply-To: References: ,, MIME-Version: 1.0 X-OriginalArrivalTime: 22 Dec 2014 04:05:16.0891 (UTC) FILETIME=[7E8B5AB0:01D01D9C] X-Virus-Checked: Checked by ClamAV on apache.org --_cad502ac-fc0c-4f67-8751-8d88bb0e47de_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable @Ajay=2C > Do we intend to write our own scheduler from scratch or extend/move to > another one? This I guess is something for all of us to discuss at a later time if there= is wide consensus and willingness to move away from the current implementa= tion. Regards Srikanth Sundarrajan > From: ajaynsit@gmail.com > Date: Mon=2C 22 Dec 2014 08:44:24 +0530 > Subject: Re: [DISCUSS] Orchestration in Falcon > To: dev@falcon.incubator.apache.org >=20 > +1 >=20 > This gives us more flexibility and control. I have also seen following > other capabilities in scheduling systems. We can see if any of these make > sense for us. >=20 > 1. NOT scheduling on certain days listed in a registered calendar (e.g= . > Business Holidays) > 2. Repeated with delayed intervals >=20 >=20 > Do we intend to write our own scheduler from scratch or extend/move to > another one? >=20 >=20 > On Mon=2C Dec 22=2C 2014 at 8:10 AM=2C Srikanth Sundarrajan > wrote: >=20 > > > > > > > > @Shaik=2C Items 1 & 6 are solved cleanly by Oozie scheduling capabiliti= es > > (and in fact the only ones solved). Am not too sure if 2 and 7 are sol= ved > > well. > > > > To cite a few examples for 2 & 7 which are hard to achieve: > > > > 1. First monday of the month > > 2. Closing day of the month > > 3. Fixed day of every month (say 7th) {key thing to note here is the > > non-uniform spacing in the monthly cycle} > > 4. At least 6 instances in a day (which is different from saying 1-6 ar= e > > mandatory and rest are optional) > > > > @Venkatesh has been pushing to do at least items 1-3 above for eternity= . > > > > There is an additional motivation: In delegating control to Oozie for > > orchestration=2C Falcon finds it hard to manage and control pipeline > > executions. I will try and enumerate them shortly on the same thread fo= r us > > to ponder over. > > > > Regards > > Srikanth Sundarrajan > > > > > From: psychidris@gmail.com > > > Date: Mon=2C 22 Dec 2014 01:52:04 +0530 > > > Subject: Re: [DISCUSS] Orchestration in Falcon > > > To: dev@falcon.incubator.apache.org > > > > > > Hi=2C > > > > > > I agree that oozie has certain limitation with respect to orchestrati= on=2C > > > recently oozie users have raised similar concerns regarding Point 2. > > (which > > > is taken care by falcon by extending Oozie EL not for scheduling but = at > > > least for consuming the input set) > > > > > > Is it not Point 1=2C 2=2C 6 and 7 are already solved by using optiona= l input > > > mechanism in Falcon? I understand that still users need to specify > > > frequency for the process. A few usecases/examples would really help. > > > > > > Thanks=2C > > > -Idris > > > > > > > > > On Sun=2C Dec 21=2C 2014 at 7:43 PM=2C Srikanth Sundarrajan < > > sriksun@hotmail.com> > > > wrote: > > > > > > > Hello Team=2C > > > > > > > > Since its inception Falcon has used Oozie for process orchestration= as > > > > well as feed life cycle phase executions=2C while this has worked > > reasonably > > > > and allowed to make higher level capabilities available through > > Falcon=2C we > > > > are increasing seeing scenarios where this is proving to be a limit= ing > > > > factor. In its current form=2C Falcon relies on Oozie for both > > scheduling and > > > > for workflow execution=2C due to which the scheduling is limited to= time > > > > based/cron based scheduling with additional gating conditions on da= ta > > > > availability. Also this imposes restrictions on datesets being > > > > periodic/cyclic in nature. > > > > > > > > From an orchestration stand point=2C it would help if we can suppor= t > > > > standard gating / scheduling primitives via Falcon: > > > > > > > > 1. Simple periodic scheduling with no gating conditions > > > > 2. Cron based scheduling (day of week=2C day of the month=2C specif= ic hours > > > > and non-periodic) with no gating conditions > > > > 3. Availability of new data (assuming monotonically increasing data > > > > version=2C availavility of new versions) > > > > 4. Changes to existing data (reinstatement - similar to late data > > handling) > > > > 5. External trigger/notifications > > > > 6. Availability of specific instances of data as declared as mandat= ory > > > > dependency > > > > 7. Availability of a minimum subset of instances of data declared a= s > > > > mandatory depedency (at least 10 hourly instances of a day with 24 > > > > instances for ex) > > > > 8. Valid combinations of the above. > > > > > > > > In this context=2C I would like to propose that we move away from O= ozie > > for > > > > the orchestration requirements and have them implemented natively > > within > > > > Falcon. It will no doubt make Falcon server bulkier and heavier in = both > > > > code and deployment=2C but seems like without it=2C the orchestrati= on > > within > > > > Falcon will be limited by capabilities available within Oozie. > > > > > > > > Please do note that this suggestion is restricted to the scheduling= and > > > > not to the workflow execution. > > > > > > > > Would like to hear from fellow developers and users on what your > > thoughts > > > > are. Please do chime in with your views. > > > > > > > > Regards > > > > Srikanth Sundarrajan > > > > > > > > > > = --_cad502ac-fc0c-4f67-8751-8d88bb0e47de_--