falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkatesh Seetharam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-369) Refactor workflow builder
Date Wed, 26 Mar 2014 16:13:17 GMT

    [ https://issues.apache.org/jira/browse/FALCON-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948047#comment-13948047

Venkatesh Seetharam commented on FALCON-369:

I agree with everything you have said. 

bq. Else, EntityManager should build the workflow and then call schedule on the workflow engine.

I'm thinking that we decouple Entity and Workflow and introduce another layer of abstraction:
Entity maps to a set of Lifecycles, 

* feed -> retention, replication, import/export in future, etc.
* process -> user process

Each Lifecycle in turn uses the builder so generate the workflow using the same workflow builder.

The advantage to this is that instead of the falcon abstractions being leaked into the workflow
builder, its a first class abstraction outside which workflow uses. Extensions become far
easier and writing new workflow builders also becomes straightforward.

How does it affect the existing code? Not much IMO. Instead of having the logic of determining
the coordinators for feed with in
org.apache.falcon.workflow.OozieFeedWorkflowBuilder#getCoordinators, org.apache.falcon.workflow.OozieFeedWorkflowBuilder#getRetentionCoordinator
and org.apache.falcon.workflow.OozieFeedWorkflowBuilder#getReplicationCoordinators, I'm suggesting
this be refactored into Lifecycle. 

In Summary:
* Falcon entities maps to Lifecycle - A set of Interfaces
* Each Lifecycle will have 3 aspects: 
    - Lifecycle to workflow Mapper - Uses workflow engine Builder
    - Lifecycle management - both entity and instances 
Lifecycle becomes the heart of the core - implying adding new lifecycle becomes a breeze with
new set of interfaces and not requiring changes to existing builders

Makes sense?

> Refactor workflow builder
> -------------------------
>                 Key: FALCON-369
>                 URL: https://issues.apache.org/jira/browse/FALCON-369
>             Project: Falcon
>          Issue Type: Improvement
>            Reporter: Shwetha G S
>            Assignee: Shwetha G S
>         Attachments: FALCON-369.patch
> Currently, feed/process workflow builder is a single class which handles all different
cases of lifecycles, storage types, workflow engines and building all oozie entities(workflow,
coord and bundle). This is not readable and difficult to maintain. This needs some re-factoring.
> Approach:
> Maintain different builders for
> 1. oozie entities - workflow, coord and bundle. 
> 2. entity types - feed and process
> 3. lifecycle - process, retention and replication
> 4. workflow engines - oozie, pig and hive

This message was sent by Atlassian JIRA

View raw message