falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ajay Yadava (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-965) Open up life cycle stage implementation within Falcon for extension
Date Sun, 27 Sep 2015 10:33:04 GMT

    [ https://issues.apache.org/jira/browse/FALCON-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909680#comment-14909680

Ajay Yadava commented on FALCON-965:

We don't want users to be building co-ords, bundles. It can be more like, the builder returns
(or points) to the user-workflow to execute (and properties) for a given policy.
Umm..I think there is some disconnect and you got the wrong impression of the requirements,
let me clarify. 
Lifecycle doesn't require users to build bundles. 
Scheduling parameters like validity, frequency etc. will be governed by lifecycle, hence policies
will need to guide scheduler. It is unavoidable irrespective of how it is built, natively
inside falcon or by extension. 

Also, native scheduler is not a "workflow engine" it is just a friendly neighborhood "scheduler"!
A scheduler takes care of the scheduling parameters and workflow engine does the actual execution.

For scheduler we will have inbuilt support for - native scheduler and oozie scheduler(through

Workflow Engine, is pluggable and we have native support for oozie workflow engine.

Currently falcon code has a very tight coupling between the two as the EntityBuilder builds
bundles which in turn builds the coordinators and which in turn builds the workflow engine.
EntityBuilders deal only at bundle level. I think this needs to change now, with the advent
of lifecycle and in future, native scheduler. I have already created a JIRA for this FALCON-1448
for decoupling this and other clean up tasks. I am just waiting for native scheduler to be

I have also created a JIRA FALCON-1478 to enable lifecycle through native scheduler and assigned
it to myself, will take care of it on priority after at least the base framework gets committed.

Will the retention element be used as fallback, in case, lifecycle fails validation?
No, there are no fallbacks. It will be very counter intuitive for the user where he wanted
to specify retention through lifecycle and we silently fell back to original retention. I
have documented this behavior in the docs as well.

Since only retention is going in now, replication will be specified in the old way. In this
case, the OozieWorkflowEngine will process the feed and generate appropriate bundles. If retention
is specified too, how does it know to ignore it?
If retention is defined through lifecycle, retention tag is ignored and we build retention
coordinator through lifecycle. Perhaps you missed the change, please refer to FeedBundleBuilder

> Open up life cycle stage implementation within Falcon for extension
> -------------------------------------------------------------------
>                 Key: FALCON-965
>                 URL: https://issues.apache.org/jira/browse/FALCON-965
>             Project: Falcon
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Srikanth Sundarrajan
>            Assignee: Ajay Yadava
>              Labels: recipes
>             Fix For: 0.8
>         Attachments: FALCON-965-v1.patch, FALCON-965-v2.patch, FALCON-965.patch, FalconLifecycle-Designdoc.pdf,
> As it stands Falcon supports replication, generation and eviction lifecycle stages and
plans to support more. This however assumes a certain way of implementing a life cycle function
and changes to these implementation aren't easy, as they are not open for easy extension.
This proposed feature is open this up in Falcon.
> Here is a proposal on how things can possibly be:
> * List of life cycles that Falcon supports would be well known and not extensible
> * Dependency between life cycles are coded up in the falcon server and not necessarily
extensible. (In short adding a new life cycle still requires changes in Falcon)
> * Each Lifecycle in falcon advertises an implementation interface and minimum configuration
interface (for ex. Eviction should expose a way to retrieve the configured time limit for
which data will be available for other life cycle stages to validate. There is no point in
having a process consume last 24 instances of a feed, when the retention will retain only
4 instances)
> * Similar to FALCON-634, life cycle implementation can be dropped in as long as the implementation
interface and configuraion interfaces are adhered to.

This message was sent by Atlassian JIRA

View raw message