falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth Sundarrajan <srik...@hotmail.com>
Subject RE: [DISCUSS] Orchestration in Falcon
Date Fri, 16 Jan 2015 02:13:30 GMT
This is a very important decision for the project and if we need to discuss this more, we should
and not rush through. So I will hold of any further action in terms pressing forward with
the design.

Here is the consolidations of views expressed so far on this thread. Folks who have responded,
please chime in if I have misrepresented any one.

Sanjeev: Agreed with the proposal
Ajay: Agreed with the proposal and wanted to know how it will be implemented
Siva Tumma: -1, as repeating some functionality in Oozie seemed wasteful
Venkatesh: -1 initially to this being built in Falcon, but ok with leveraging capabilities
through alternate scheduler such as Quartz/Yarn. Subsequently expressed how chugging along
with Oozie is not ideal in the long run
Shwetha: Ok with replacing Oozie altogehter including workflow execution. She felt that some
of these may exist in Oozie and yet to revert if they really are.
JB: Initially had reservations to repeating functionality in Falcon, later +1
Shaik: Agreed to the proposal, additionally calling out more capabilitiies than was originally
called out in the initial thread.
Srikanth: I would like to provide lot more capabilities to users than what is supported and
really like for this to happen, so +1

Regards
Srikanth Sundarrajan

> Date: Thu, 15 Jan 2015 11:27:17 -0800
> Subject: Re: [DISCUSS] Orchestration in Falcon
> From: venkatesh@innerzeal.com
> To: dev@falcon.apache.org
> 
> On Thu, Jan 15, 2015 at 1:25 AM, Srikanth Sundarrajan <sriksun@hotmail.com>
> wrote:
> 
> > -dev@f.i.a.o
> >
> > It looks like we have broad consensus on this,
> 
> Really? Thats not how I read this? I'm still not sure its worth taking on
> this complexity into Falcon. Did we even explore other options? I'm not
> sure.
> 
> 
> > should we open up a discuss thread on how we go about this ?
> 
> May be.
> 
> 
> > Or should we create a confluence page and collaborate through that ?
> >
> Too early for this.
> 
> 
> >
> > Regards
> > Srikanth Sundarrajan
> >
> > > From: psychidris@gmail.com
> > > Date: Thu, 1 Jan 2015 22:40:48 +0530
> > > Subject: Re: [DISCUSS] Orchestration in Falcon
> > > To: dev@falcon.incubator.apache.org
> > >
> > > +1.
> > >
> > > Few more relevant asks:
> > > 1. Support for "Last Only" option for process scheduling (In addition to
> > >  LIFO/FIFO), currently oozie has some issues.
> > > 2. Support for Singleton process (lock based), the behaviour of all
> > > instances of process is same.
> > >
> > > Thanks,
> > > -Idris
> > >
> > >
> > > On Thu, Jan 1, 2015 at 7:51 PM, Jean-Baptiste Onofré <jb@nanthrax.net>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Regards
> > > > JB
> > > >
> > > >
> > > > On 12/31/2014 03:53 PM, Srikanth Sundarrajan wrote:
> > > >
> > > >> Can we pick up this thread in the new year when folks are back from
> > > >> break? I am in total agreement with Venkatesh here. We ought to have
> > a long
> > > >> term sustainable approach. Also I feel that the capabilities that
we
> > would
> > > >> like to enable on falcon and getting them done through oozie in near
> > term
> > > >> seems to be a tall ask anyways.
> > > >>
> > > >> Regards
> > > >> Srikanth Sundarrajan
> > > >>
> > > >>  Date: Tue, 23 Dec 2014 16:44:06 -0800
> > > >>> Subject: Re: [DISCUSS] Orchestration in Falcon
> > > >>> From: venkatesh@innerzeal.com
> > > >>> To: dev@falcon.incubator.apache.org
> > > >>>
> > > >>> Chugging along with Oozie is bad for Falcon in the long run, for
> > users
> > > >>> and
> > > >>> developers. Its horribly complex to work through the many rough
edges
> > > >>> architecturally in Oozie. Look at all the patches for security
that
> > I had
> > > >>> to fix around Oozie. Its unnecessarily very complex, non-uniform
and
> > is
> > > >>> NOT
> > > >>> meant to be used by another tool like Falcon but was built around
end
> > > >>> user.
> > > >>>
> > > >>> This is a good discussion to have - may be explore oozie for
> > short-term
> > > >>> but
> > > >>> look at alternative solutions for the long-term.
> > > >>>
> > > >>> On Tue, Dec 23, 2014 at 7:28 AM, Srikanth Sundarrajan <
> > > >>> sriksun@hotmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>  @jb, There is no doubt merit in mapping them to oozie if possible
> > and if
> > > >>>> extensions are simple and straight forward enough.
> > > >>>>
> > > >>>> Also had a quick chat offline with Shwetha and she mentioned
about
> > some
> > > >>>> work happening in Oozie in this regard. On further digging
up, found
> > > >>>> https://issues.apache.org/jira/browse/OOZIE-1976. This is
possibly
> > what
> > > >>>> Shwetha was referring to. From the looks of it, this tries
to
> > address
> > > >>>> item
> > > >>>> #7 in the original thread.  May be there are more jiras where
> > additional
> > > >>>> work such as a-periodic datasets is being worked on. Perhaps
> > @Shwetha
> > > >>>> can
> > > >>>> throw some light on what is being considered and/or how these
> > > >>>> gating/orchestration use cases can be managed.
> > > >>>>
> > > >>>> Regards
> > > >>>> Srikanth Sundarrajan
> > > >>>>
> > > >>>>  Date: Tue, 23 Dec 2014 11:06:24 +0100
> > > >>>>> From: jb@nanthrax.net
> > > >>>>> To: dev@falcon.incubator.apache.org
> > > >>>>> Subject: Re: [DISCUSS] Orchestration in Falcon
> > > >>>>>
> > > >>>>> Hi all,
> > > >>>>>
> > > >>>>> I second Shwetha there. I think we can achieve such features
in
> > Oozie
> > > >>>>> (with some adaptations).
> > > >>>>>
> > > >>>>> Regards
> > > >>>>> JB
> > > >>>>>
> > > >>>>> Le 2014-12-23 10:53, Shwetha G S a écrit :
> > > >>>>>
> > > >>>>>> If we can get rid of oozie entirely, yes we can explore
other
> > > >>>>>> possibilities. But if we are still going to use oozie
for DAG
> > > >>>>>> execution, we
> > > >>>>>> are going to add add another bottleneck in the whole
> > > >>>>>> execution(currently,
> > > >>>>>> falcon is not in the workflow execution path) and
I don't think
> > its
> > > >>>>>> worth
> > > >>>>>> it.
> > > >>>>>>
> > > >>>>>> The features that are outlined above are all available
in basic
> > forms
> > > >>>>>> in
> > > >>>>>> oozie and it should be easy to enhance them/make them
as extension
> > > >>>>>> points.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> -Shwetha
> > > >>>>>>
> > > >>>>>> On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan
> > > >>>>>> <sriksun@hotmail.com>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>  Here are few more gaps that we ought to solve for
while we are
> > on the
> > > >>>>>>> subject:
> > > >>>>>>>
> > > >>>>>>> 1. Ability to attach to start & finish events
of workflow
> > execution.
> > > >>>>>>> Currently we have post processing hook to listen
to finish
> > events,
> > > >>>>>>> but
> > > >>>>>>> we
> > > >>>>>>> do run into scenarios where there are occasional
failures with
> > > >>>>>>> post-processing and there is potential phase lag
in learning
> > about
> > > >>>>>>> the
> > > >>>>>>> events.
> > > >>>>>>> 2. Strict enforcement of concurrency control possibly
spanning
> > > >>>>>>> process
> > > >>>>>>> boundaries.
> > > >>>>>>> 3. Ability to tune how backlogs have to be caught
up (old
> > instances
> > > >>>>>>> to
> > > >>>>>>> be
> > > >>>>>>> given higher priority, newer instances to be given
higher
> > priority,
> > > >>>>>>> or
> > > >>>>>>> some
> > > >>>>>>> sort of weights to allow both to make progress
at varying rates).
> > > >>>>>>> There
> > > >>>>>>> have been asks for routing current vs older instances
to
> > different
> > > >>>>>>> queues
> > > >>>>>>> by users as an alternative.
> > > >>>>>>> 4. Ability to have a notion of non-time based
feed instances and
> > > >>>>>>> related
> > > >>>>>>> coordination.
> > > >>>>>>> 5. Currently keeping track of and managing SLAs
is also a
> > challenge,
> > > >>>>>>> but
> > > >>>>>>> with #1 addressed, this might be a lesser concern.
> > > >>>>>>>
> > > >>>>>>> Regards
> > > >>>>>>> Srikanth Sundarrajan
> > > >>>>>>>
> > > >>>>>>>  Subject: Re: [DISCUSS] Orchestration in Falcon
> > > >>>>>>>> From: sriksun@hotmail.com
> > > >>>>>>>> Date: Tue, 23 Dec 2014 06:30:30 +0530
> > > >>>>>>>> To: dev@falcon.incubator.apache.org
> > > >>>>>>>>
> > > >>>>>>>> @venkatesh, the question really is how do
we enable these
> > gating pre
> > > >>>>>>>>
> > > >>>>>>> conditions. Seems hard enough to add them to oozie,
but am not
> > > >>>>>>> intimately
> > > >>>>>>> familiar with oozie to comment on how hard or
easy it is. Like I
> > > >>>>>>> responded
> > > >>>>>>> to @ajay on the same thread, if we are to do away
with
> > coordination
> > > >>>>>>> through
> > > >>>>>>> oozie, we can follow up this discussion with approaches
and
> > design.
> > > >>>>>>> Though
> > > >>>>>>> I had quartz in my mind, wanted to leave that
out of discussion
> > to
> > > >>>>>>> see
> > > >>>>>>> if
> > > >>>>>>> there is consensus for moving away from oozie
coords and
> > implementing
> > > >>>>>>> them
> > > >>>>>>> through other means.
> > > >>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Sent from my iPhone
> > > >>>>>>>>
> > > >>>>>>>>  On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh"
<
> > > >>>>>>>>>
> > > >>>>>>>> venkatesh@innerzeal.com> wrote:
> > > >>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> What is the purpose of this decoupling?
Why build this into
> > > >>>>>>>>>
> > > >>>>>>>> Falcon?
> > > >>>>
> > > >>>>> Scheduling is so common that there are dime a dozen schedulers
> > > >>>>>>>>>
> > > >>>>>>>> today
> > > >>>>
> > > >>>>> and
> > > >>>>>>>
> > > >>>>>>>> they are all extensible with custom triggers.
Making it part of
> > > >>>>>>>>>
> > > >>>>>>>> Falcon
> > > >>>>
> > > >>>>> will
> > > >>>>>>>
> > > >>>>>>>> suffer the same issues that Oozie has today.
> > > >>>>>>>>>
> > > >>>>>>>>> I'm sorry but I'm a HUGE -1 to this being
built into Falcon
> > > >>>>>>>>>
> > > >>>>>>>> codebase.
> > > >>>>
> > > >>>>>
> > > >>>>>>>>> However, I'm +1 to reusing Quartz scheduler
that already
> > exists -
> > > >>>>>>>>>
> > > >>>>>>>> stand it
> > > >>>>>>>
> > > >>>>>>>> up outside or embed it like we do for active
MQ.
> > > >>>>>>>>>
> > > >>>>>>>>> Phase 2 - I'd like to see we write a simple
DAG execution
> > layer in
> > > >>>>>>>>>
> > > >>>>>>>> YARN as
> > > >>>>>>>
> > > >>>>>>>> an app master with out DB and keeps state
on HDFS as an
> > alternate
> > > >>>>>>>>>
> > > >>>>>>>> to
> > > >>>>
> > > >>>>> Oozie.
> > > >>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> Then we will have a nimble falcon which
can kick ass.
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Sun, Dec 21, 2014 at 6:13 AM, Srikanth
Sundarrajan <
> > > >>>>>>>>>
> > > >>>>>>>> sriksun@hotmail.com>
> > > >>>>>>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>  Hello Team,
> > > >>>>>>>>>>
> > > >>>>>>>>>> Since its inception Falcon has used
Oozie for process
> > > >>>>>>>>>>
> > > >>>>>>>>> orchestration as
> > > >>>>
> > > >>>>> well as feed life cycle phase executions, while this has
worked
> > > >>>>>>>>>>
> > > >>>>>>>>> reasonably
> > > >>>>>>>
> > > >>>>>>>> and allowed to make higher level capabilities
available through
> > > >>>>>>>>>>
> > > >>>>>>>>> Falcon, we
> > > >>>>>>>
> > > >>>>>>>> are increasing seeing scenarios where this
is proving to be a
> > > >>>>>>>>>>
> > > >>>>>>>>> limiting
> > > >>>>
> > > >>>>> factor. In its current form, Falcon relies on Oozie for
both
> > > >>>>>>>>>>
> > > >>>>>>>>> scheduling and
> > > >>>>>>>
> > > >>>>>>>> for workflow execution, due to which the scheduling
is limited
> > > >>>>>>>>>>
> > > >>>>>>>>> to time
> > > >>>>
> > > >>>>> based/cron based scheduling with additional gating conditions
on
> > > >>>>>>>>>>
> > > >>>>>>>>> data
> > > >>>>
> > > >>>>> availability. Also this imposes restrictions on datesets
being
> > > >>>>>>>>>> periodic/cyclic in nature.
> > > >>>>>>>>>>
> > > >>>>>>>>>>  From an orchestration stand point,
it would help if we can
> > > >>>>>>>>>>
> > > >>>>>>>>> support
> > > >>>>
> > > >>>>> standard gating / scheduling primitives via Falcon:
> > > >>>>>>>>>>
> > > >>>>>>>>>> 1. Simple periodic scheduling with
no gating conditions
> > > >>>>>>>>>> 2. Cron based scheduling (day of week,
day of the month,
> > specific
> > > >>>>>>>>>>
> > > >>>>>>>>> hours
> > > >>>>>>>
> > > >>>>>>>> and non-periodic) with no gating conditions
> > > >>>>>>>>>> 3. Availability of new data (assuming
monotonically increasing
> > > >>>>>>>>>>
> > > >>>>>>>>> data
> > > >>>>
> > > >>>>> version, availavility of new versions)
> > > >>>>>>>>>> 4. Changes to existing data (reinstatement
- similar to late
> > data
> > > >>>>>>>>>>
> > > >>>>>>>>> handling)
> > > >>>>>>>
> > > >>>>>>>> 5. External trigger/notifications
> > > >>>>>>>>>> 6. Availability of specific instances
of data as declared as
> > > >>>>>>>>>>
> > > >>>>>>>>> mandatory
> > > >>>>
> > > >>>>> dependency
> > > >>>>>>>>>> 7. Availability of a minimum subset
of instances of data
> > > >>>>>>>>>>
> > > >>>>>>>>> declared as
> > > >>>>
> > > >>>>> mandatory depedency (at least 10 hourly instances of a
day with
> > > >>>>>>>>>>
> > > >>>>>>>>> 24
> > > >>>>
> > > >>>>> instances for ex)
> > > >>>>>>>>>> 8. Valid combinations of the above.
> > > >>>>>>>>>>
> > > >>>>>>>>>> In this context, I would like to propose
that we move away
> > from
> > > >>>>>>>>>>
> > > >>>>>>>>> Oozie
> > > >>>>
> > > >>>>> for
> > > >>>>>>>
> > > >>>>>>>> the orchestration requirements and have them
implemented
> > natively
> > > >>>>>>>>>>
> > > >>>>>>>>> within
> > > >>>>>>>
> > > >>>>>>>> Falcon. It will no doubt make Falcon server
bulkier and heavier
> > > >>>>>>>>>>
> > > >>>>>>>>> in
> > > >>>>
> > > >>>>> both
> > > >>>>>>>
> > > >>>>>>>> code and deployment, but seems like without
it, the
> > orchestration
> > > >>>>>>>>>>
> > > >>>>>>>>> within
> > > >>>>>>>
> > > >>>>>>>> Falcon will be limited by capabilities available
within Oozie.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Please do note that this suggestion
is restricted to the
> > > >>>>>>>>>>
> > > >>>>>>>>> scheduling
> > > >>>>
> > > >>>>> and
> > > >>>>>>>
> > > >>>>>>>> not to the workflow execution.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Would like to hear from fellow developers
and users on what
> > your
> > > >>>>>>>>>>
> > > >>>>>>>>> thoughts
> > > >>>>>>>
> > > >>>>>>>> are. Please do chime in with your views.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Regards
> > > >>>>>>>>>> Srikanth Sundarrajan
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>>> Regards,
> > > >>>>>>>>> Venkatesh
> > > >>>>>>>>>
> > > >>>>>>>>> “Perfection (in design) is achieved
not when there is nothing
> > > >>>>>>>>>
> > > >>>>>>>> more to
> > > >>>>
> > > >>>>> add,
> > > >>>>>>>
> > > >>>>>>>> but rather when there is nothing more to take
away.”
> > > >>>>>>>>> - Antoine de Saint-Exupéry
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Regards,
> > > >>> Venkatesh
> > > >>>
> > > >>> “Perfection (in design) is achieved not when there is nothing
more to
> > > >>> add,
> > > >>> but rather when there is nothing more to take away.”
> > > >>> - Antoine de Saint-Exupéry
> > > >>>
> > > >>
> > > >>
> > > >>
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbonofre@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> >
> >
> 
> 
> 
> -- 
> Regards,
> Venkatesh
> 
> “Perfection (in design) is achieved not when there is nothing more to add,
> but rather when there is nothing more to take away.”
> - Antoine de Saint-Exupéry
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message