falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srikanth Sundarrajan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-369) Refactor workflow builder
Date Thu, 05 Jun 2014 05:44:02 GMT

    [ https://issues.apache.org/jira/browse/FALCON-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018495#comment-14018495

Srikanth Sundarrajan commented on FALCON-369:

Discussed this with [~svenkat] at length today in person. Here is the summary of the discussions
on this. We can debate the approach further and converge on the right way forward. Objectives
of this refactoring has clearly been laid out earlier in this issue description and subsequent
conversation, but summarizing it to avoid going through the whole thread again.

1. Lifecycle specific code is deeply embedded within specific workflow engine implementation.
Supporting a new engine would result in repeating the logic and also might introduce errors
due to this repetition.
2. Specifically for Oozie bundle object and coord object creation are interspersed making
it very difficult to work with these
3. Since support for multiple application engine (pig, hive, oozie) were added, there needs
a level of indirection to allow for these variations

Minutes from the discussion
1. Currently Entity & Instance Manager directly talks to the workflow engine. It would
be helpful to introduce a layer that understands entities life cycle and not directly pass
the burden on the workflow engine
2. We can introduce a concept of life-cycle object which can be built out of the schedulable
entity manager that a builder can consume.
3. Builder is specific to the workflow engine. Builder accepts a collection of life cycle
objects to work on. For oozie builder, builder can be composed further of bundle builder and
coord builder. These builders can further be aware of the app types (pig, hive, oozie) it
is working with.
4. Output of the builder is to be used in the workflow engine, which is specific to the scheduler
5. Dryrun can be a behavior on the workflow engine.

Let us debate this and agree on the approach and once we have consensus on this, we can break
this into smaller JIRAs to go after specific aspects of the refactoring. 

> Refactor workflow builder
> -------------------------
>                 Key: FALCON-369
>                 URL: https://issues.apache.org/jira/browse/FALCON-369
>             Project: Falcon
>          Issue Type: Improvement
>            Reporter: Shwetha G S
>            Assignee: Shwetha G S
>         Attachments: FALCON-369.patch, FalconWorkflowBuilder.png
> Currently, feed/process workflow builder is a single class which handles all different
cases of lifecycles, storage types, workflow engines and building all oozie entities(workflow,
coord and bundle). This is not readable and difficult to maintain. This needs some re-factoring.
> Approach:
> Maintain different builders for
> 1. oozie entities - workflow, coord and bundle. 
> 2. entity types - feed and process
> 3. lifecycle - process, retention and replication
> 4. workflow engines - oozie, pig and hive

This message was sent by Atlassian JIRA

View raw message