falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth Sundarrajan <srik...@hotmail.com>
Subject RE: [DISCUSS] Orchestration in Falcon
Date Tue, 07 Apr 2015 16:49:09 GMT
I am fully behind this for the following reasons:

1. Managing the scheduling capability (regardless of the feasibility or infeasibility) in
Oozie means that all changes have to make it to Oozie upstream and released, before they can
be used from within Falcon.

2. Supporting new gating & throttling primitives with the awareness of dependencies between
entities seems to ask for too much changes into Oozie to be done incrementally. This might
require at least some major design changes in Oozie.

3. As many in the falcon-dev community would agree, it would be ideal for falcon to be less
dependent on Oozie in the long run.

4. It would be easier and simpler to handle stream datasets if falcon was to directionally
support these in near or far future.

5. Currently there is a lot of bloat in scheduler integration because of the way Oozie functions
and this complexity will reduce if we have a more simpler scheduler to integrate with.

6. Notion of parent workflow (associated pre-processing & post-processing) overheads by
occupying a slot in the cluster also is begging for attention and improvement.

Srikanth Sundarrajan

> Date: Tue, 7 Apr 2015 11:27:52 +0530
> Subject: Re: [DISCUSS] Orchestration in Falcon
> From: pallavi.rao@inmobi.com
> To: dev@falcon.apache.org
> Hi,
> I was recently looking at some of the use cases at InMobi and how to
> enhance Falcon to accommodate those and I realized that due to our
> dependency on Oozie coordinator, some of these cannot be easily achieved or
> take a much longer cycle as we have to wait for Oozie to add some
> functionality.
> I was pointed to this thread that dates slightly before my time in Falcon (
> https://www.mail-archive.com/dev@falcon.incubator.apache.org/msg09268.html).
> I wanted to reopen the thread for discussion, with my 2 cents:
> 1. Some of the scheduling primitives that are already mentioned in the
> thread, especially, support for a-periodic datasets or external triggering
> mechanisms are not available in Oozie. It might not even be a natural fit
> for Oozie to add these.
> 2. Adding new primitives in Falcon becomes harder and longer as we
> completely depend on Oozie for the same. Extensibility of Falcon is stunted.
> 3. Oozie has very limited support for throttling resource utilization.
> We can only control the no. of parallel instances of a coordinator job.
> 4. Oozie currently has no notion of inter dependency of
> instances/workflows, whereas, in Falcon, it will be very useful to
> gate/throttle based on the interdependency. For example, re-run a pipeline
> (or a subset) or throttle resource utilization of a pipeline when in
> "backlog catchup" mode.
> 5. We end up with bugs like FALCON-1127
> <https://issues.apache.org/jira/browse/FALCON-1127>, because Falcon
> constantly needs to play catchup with Oozie changes.
> On the thread, most people did seem to be in favor of a native scheduler in
> Falcon. If you all think this is useful, I'll volunteer to start work on
> this and we can build out a scheduler/orchestrator in Falcon that can open
> up a whole lot of possibilities for Falcon users.
> Thanks,
> Pallavi
> --
> _____________________________________________________________
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify
> us immediately by responding to this email and then delete it from your
> system. The firm is neither liable for the proper and complete transmission
> of the information contained in this communication nor for any delay in its
> receipt.
View raw message