falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sowmya Ramesh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-634) Add recipes in Falcon
Date Fri, 16 Jan 2015 02:31:35 GMT

    [ https://issues.apache.org/jira/browse/FALCON-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279720#comment-14279720

Sowmya Ramesh commented on FALCON-634:

CLI can query the server for recipe artifacts, use them in the client process to build the

I am not convinced that recipe artifacts should be packaged with server and deployed either
locally on server or HDFS.

Recipe is a client side concept. If we deploy the artifacts on HDFS or local FS on server,
every recipe submission will require all the recipe artifacts to be copied from the remote
m/c [HDFS or server if its running on a different m/c than client] to client m/c to build
the recipe.

If recipe artifacts are packaged with Client, list and describe recipe functionality can still
be implemented. client.properties can be updated with the path where the artifacts will be
installed and one time deployment can be done as part of client installation. 

Below, I am listing pros and cons of packaging artifacts with Client or Server.

*Packaging artifacts with Server:*

h4. Pros
1. On addition of new recipe support, user has to upgrade only server to latest version

1. Every time a recipe is submitted all the template files has to be copied from remote m/c
to client
2. User has to copy the property template file from remote m/c [HDFS or local FS on server
if server and client are running on diff m/c's] for updating it with required values

*Packaging artifacts with Client:*

h4. Pros:
1. User has to copy the property template file from local FS for updating it with required
values. Better usability as there is no need to SCP or copying from HDFS
2. Downloading template files from remote m/c on every recipe submission is not required

h4. Cons:
1. On addition of new recipe artifacts all the clients have to be upgraded to use this functionality
2. Size of client jar will increase but this will be minimal

Permissions for these artifacts should be read only as these are just the templates. User
is expected to make a copy of the .property template, edit it accordingly before recipe submission.
Every instance of recipe will have its own set of properties. User is expected to copy the
.properties template, update it with the required values and pass the path in CLI 
e.g. falcon recipe -name hdfs-replication -propertyFilePath <path>

Also, users may have  a use case requiring to edit process template or WF template to add
additional elements for a given recipe instance. If recipe tool uses templates from the shared
location and if we allow editing templates then all clients/ recipe instances are forced to
use edited template which may not be intended. Option should be provided to override recipe
artifact location. User can make a copy of the templates, edit it and pass the path in the

e.g. falcon recipe -name hdfs-replication -propertyFilePath <path> -location <pathToTemplates>
[-location is optional and should be used only for overriding the artifact path]

> Add recipes in Falcon
> ---------------------
>                 Key: FALCON-634
>                 URL: https://issues.apache.org/jira/browse/FALCON-634
>             Project: Falcon
>          Issue Type: Improvement
>    Affects Versions: 0.6
>            Reporter: Venkatesh Seetharam
>              Labels: recipes
> Falcon offers many services OOTB and caters to a wide array of use cases. However, there
has been many asks that does not fit the functionality offered by Falcon. I'm proposing that
we add recipes to Falcon which is similar to recipes in Whirr and other management solutions
such as puppet and chef.
> Overview:
> A recipe essentially is a static process template with parameterized workflow to realize
a specific use case. For example:
> * replicating directories from one HDFS cluster to another (not timed partitions)
> * replicating hive metadata (database, table, views, etc.)
> * replicating between HDFS and Hive - either way
> * anonymization of data based on schema
> * data masking
> * etc.
> Proposal:
> Falcon provides a Process abstraction that encapsulates the configuration 
> for a user workflow with scheduling controls. All recipes can be modeled 
> as a Process with in Falcon which executes the user workflow 
> periodically. The process and its associated workflow are parameterized. The user will
provide a properties file with name value pairs that are substituted by falcon before scheduling
> This is a client side concept. The server does not know about a recipe but only accepts
the cooked recipe as a process entity. 
> The CLI would look something like this:
> falcon -recipe $recipe_name -properties $properties_file
> Recipes will reside inside addons (contrib) with source code and will have an option
to package 'em.

This message was sent by Atlassian JIRA

View raw message