falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth Sundarrajan <srik...@hotmail.com>
Subject RE: [DISCUSS] Orchestration in Falcon
Date Tue, 23 Dec 2014 15:28:27 GMT
@jb, There is no doubt merit in mapping them to oozie if possible and if extensions are simple
and straight forward enough. 

Also had a quick chat offline with Shwetha and she mentioned about some work happening in
Oozie in this regard. On further digging up, found https://issues.apache.org/jira/browse/OOZIE-1976.
This is possibly what Shwetha was referring to. From the looks of it, this tries to address
item #7 in the original thread.  May be there are more jiras where additional work such as
a-periodic datasets is being worked on. Perhaps @Shwetha can throw some light on what is being
considered and/or how these gating/orchestration use cases can be managed.

Regards
Srikanth Sundarrajan

> Date: Tue, 23 Dec 2014 11:06:24 +0100
> From: jb@nanthrax.net
> To: dev@falcon.incubator.apache.org
> Subject: Re: [DISCUSS] Orchestration in Falcon
> 
> Hi all,
> 
> I second Shwetha there. I think we can achieve such features in Oozie 
> (with some adaptations).
> 
> Regards
> JB
> 
> Le 2014-12-23 10:53, Shwetha G S a écrit :
> > If we can get rid of oozie entirely, yes we can explore other
> > possibilities. But if we are still going to use oozie for DAG 
> > execution, we
> > are going to add add another bottleneck in the whole 
> > execution(currently,
> > falcon is not in the workflow execution path) and I don't think its 
> > worth
> > it.
> > 
> > The features that are outlined above are all available in basic forms 
> > in
> > oozie and it should be easy to enhance them/make them as extension 
> > points.
> > 
> > 
> > 
> > -Shwetha
> > 
> > On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan 
> > <sriksun@hotmail.com>
> > wrote:
> > 
> >> Here are few more gaps that we ought to solve for while we are on the
> >> subject:
> >> 
> >> 1. Ability to attach to start & finish events of workflow execution.
> >> Currently we have post processing hook to listen to finish events, but 
> >> we
> >> do run into scenarios where there are occasional failures with
> >> post-processing and there is potential phase lag in learning about the
> >> events.
> >> 2. Strict enforcement of concurrency control possibly spanning process
> >> boundaries.
> >> 3. Ability to tune how backlogs have to be caught up (old instances to 
> >> be
> >> given higher priority, newer instances to be given higher priority, or 
> >> some
> >> sort of weights to allow both to make progress at varying rates). 
> >> There
> >> have been asks for routing current vs older instances to different 
> >> queues
> >> by users as an alternative.
> >> 4. Ability to have a notion of non-time based feed instances and 
> >> related
> >> coordination.
> >> 5. Currently keeping track of and managing SLAs is also a challenge, 
> >> but
> >> with #1 addressed, this might be a lesser concern.
> >> 
> >> Regards
> >> Srikanth Sundarrajan
> >> 
> >> > Subject: Re: [DISCUSS] Orchestration in Falcon
> >> > From: sriksun@hotmail.com
> >> > Date: Tue, 23 Dec 2014 06:30:30 +0530
> >> > To: dev@falcon.incubator.apache.org
> >> >
> >> > @venkatesh, the question really is how do we enable these gating pre
> >> conditions. Seems hard enough to add them to oozie, but am not 
> >> intimately
> >> familiar with oozie to comment on how hard or easy it is. Like I 
> >> responded
> >> to @ajay on the same thread, if we are to do away with coordination 
> >> through
> >> oozie, we can follow up this discussion with approaches and design. 
> >> Though
> >> I had quartz in my mind, wanted to leave that out of discussion to see 
> >> if
> >> there is consensus for moving away from oozie coords and implementing 
> >> them
> >> through other means.
> >> >
> >> > Sent from my iPhone
> >> >
> >> > > On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh" <
> >> venkatesh@innerzeal.com> wrote:
> >> > >
> >> > > What is the purpose of this decoupling? Why build this into Falcon?
> >> > > Scheduling is so common that there are dime a dozen schedulers today
> >> and
> >> > > they are all extensible with custom triggers. Making it part of Falcon
> >> will
> >> > > suffer the same issues that Oozie has today.
> >> > >
> >> > > I'm sorry but I'm a HUGE -1 to this being built into Falcon codebase.
> >> > >
> >> > > However, I'm +1 to reusing Quartz scheduler that already exists -
> >> stand it
> >> > > up outside or embed it like we do for active MQ.
> >> > >
> >> > > Phase 2 - I'd like to see we write a simple DAG execution layer in
> >> YARN as
> >> > > an app master with out DB and keeps state on HDFS as an alternate
to
> >> Oozie.
> >> > >
> >> > > Then we will have a nimble falcon which can kick ass.
> >> > >
> >> > >
> >> > > On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan <
> >> sriksun@hotmail.com>
> >> > > wrote:
> >> > >
> >> > >> Hello Team,
> >> > >>
> >> > >> Since its inception Falcon has used Oozie for process orchestration
as
> >> > >> well as feed life cycle phase executions, while this has worked
> >> reasonably
> >> > >> and allowed to make higher level capabilities available through
> >> Falcon, we
> >> > >> are increasing seeing scenarios where this is proving to be a
limiting
> >> > >> factor. In its current form, Falcon relies on Oozie for both
> >> scheduling and
> >> > >> for workflow execution, due to which the scheduling is limited
to time
> >> > >> based/cron based scheduling with additional gating conditions
on data
> >> > >> availability. Also this imposes restrictions on datesets being
> >> > >> periodic/cyclic in nature.
> >> > >>
> >> > >> From an orchestration stand point, it would help if we can support
> >> > >> standard gating / scheduling primitives via Falcon:
> >> > >>
> >> > >> 1. Simple periodic scheduling with no gating conditions
> >> > >> 2. Cron based scheduling (day of week, day of the month, specific
> >> hours
> >> > >> and non-periodic) with no gating conditions
> >> > >> 3. Availability of new data (assuming monotonically increasing
data
> >> > >> version, availavility of new versions)
> >> > >> 4. Changes to existing data (reinstatement - similar to late data
> >> handling)
> >> > >> 5. External trigger/notifications
> >> > >> 6. Availability of specific instances of data as declared as mandatory
> >> > >> dependency
> >> > >> 7. Availability of a minimum subset of instances of data declared
as
> >> > >> mandatory depedency (at least 10 hourly instances of a day with
24
> >> > >> instances for ex)
> >> > >> 8. Valid combinations of the above.
> >> > >>
> >> > >> In this context, I would like to propose that we move away from
Oozie
> >> for
> >> > >> the orchestration requirements and have them implemented natively
> >> within
> >> > >> Falcon. It will no doubt make Falcon server bulkier and heavier
in
> >> both
> >> > >> code and deployment, but seems like without it, the orchestration
> >> within
> >> > >> Falcon will be limited by capabilities available within Oozie.
> >> > >>
> >> > >> Please do note that this suggestion is restricted to the scheduling
> >> and
> >> > >> not to the workflow execution.
> >> > >>
> >> > >> Would like to hear from fellow developers and users on what your
> >> thoughts
> >> > >> are. Please do chime in with your views.
> >> > >>
> >> > >> Regards
> >> > >> Srikanth Sundarrajan
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Regards,
> >> > > Venkatesh
> >> > >
> >> > > “Perfection (in design) is achieved not when there is nothing more
to
> >> add,
> >> > > but rather when there is nothing more to take away.”
> >> > > - Antoine de Saint-Exupéry
> >> 
> >> 
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message