falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seetharam Venkatesh <venkat...@innerzeal.com>
Subject Re: [DISCUSS] Orchestration in Falcon
Date Mon, 22 Dec 2014 19:43:40 GMT
What is the purpose of this decoupling? Why build this into Falcon?
Scheduling is so common that there are dime a dozen schedulers today and
they are all extensible with custom triggers. Making it part of Falcon will
suffer the same issues that Oozie has today.

I'm sorry but I'm a HUGE -1 to this being built into Falcon codebase.

However, I'm +1 to reusing Quartz scheduler that already exists - stand it
up outside or embed it like we do for active MQ.

Phase 2 - I'd like to see we write a simple DAG execution layer in YARN as
an app master with out DB and keeps state on HDFS as an alternate to Oozie.

Then we will have a nimble falcon which can kick ass.

On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan <sriksun@hotmail.com>

> Hello Team,
> Since its inception Falcon has used Oozie for process orchestration as
> well as feed life cycle phase executions, while this has worked reasonably
> and allowed to make higher level capabilities available through Falcon, we
> are increasing seeing scenarios where this is proving to be a limiting
> factor. In its current form, Falcon relies on Oozie for both scheduling and
> for workflow execution, due to which the scheduling is limited to time
> based/cron based scheduling with additional gating conditions on data
> availability. Also this imposes restrictions on datesets being
> periodic/cyclic in nature.
> From an orchestration stand point, it would help if we can support
> standard gating / scheduling primitives via Falcon:
> 1. Simple periodic scheduling with no gating conditions
> 2. Cron based scheduling (day of week, day of the month, specific hours
> and non-periodic) with no gating conditions
> 3. Availability of new data (assuming monotonically increasing data
> version, availavility of new versions)
> 4. Changes to existing data (reinstatement - similar to late data handling)
> 5. External trigger/notifications
> 6. Availability of specific instances of data as declared as mandatory
> dependency
> 7. Availability of a minimum subset of instances of data declared as
> mandatory depedency (at least 10 hourly instances of a day with 24
> instances for ex)
> 8. Valid combinations of the above.
> In this context, I would like to propose that we move away from Oozie for
> the orchestration requirements and have them implemented natively within
> Falcon. It will no doubt make Falcon server bulkier and heavier in both
> code and deployment, but seems like without it, the orchestration within
> Falcon will be limited by capabilities available within Oozie.
> Please do note that this suggestion is restricted to the scheduling and
> not to the workflow execution.
> Would like to hear from fellow developers and users on what your thoughts
> are. Please do chime in with your views.
> Regards
> Srikanth Sundarrajan


“Perfection (in design) is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.”
- Antoine de Saint-Exupéry

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message