falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sowmya Ramesh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1107) Move trusted recipe processing to server side
Date Wed, 10 Feb 2016 23:03:18 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141901#comment-15141901

Sowmya Ramesh commented on FALCON-1107:

Opening up for discussion about where to publish the recipe artifacts.

Recipe artifacts repo is structured as 
|-- Recipe1
    |-- README
    |-- META
    |-- libs
        |-- build
        |-- runtime
    |-- resources
        |-- build
        |-- runtime

Falcon server should be aware of where recipe artifacts are hosted as it is required for cooking
the recipe and also to provide recipe repository management feature. Multiple clients can
use the same recipe, so this should be hosted in one centralized location.

For trusted recipes which are provided by Falcon OOTB the artifacts are published by Falcon
and for custom recipes instructions are provided to the user about where to publish the artifacts.
Also, this should be designed so that it works for both unsecure and secure cluster setup.

Today, in Falcon during the process entity validation its assumed that workflow[WF] and WF
libs are present on the cluster where the process instance runs. This has to change as in
case of recipes WF and libs can reside on different cluster. Also, in case of secure cluster,
required NN principals should be passed to access the file and configuration "mapreduce.job.hdfs-servers"
should be updated for job execution to succeed. 

One approach is similar to config store uri introduce another config for recipe store uri
in startup properties and another config to set the NN principal. For trusted recipes Ambari
can be used to copy the required artifacts to this location. In case of custom recipes user
has to manually copy them. One issue with this approach is that since its configurable if
user changes this config later say when implementing custom recipes, it will break the trusted
recipes if the artifacts are not copied to new recipe location.

Please let me know if there any suggestions or better approaches, thanks!
cc: [~sriksun], [~venkatnrangan]

> Move trusted recipe processing to server side
> ---------------------------------------------
>                 Key: FALCON-1107
>                 URL: https://issues.apache.org/jira/browse/FALCON-1107
>             Project: Falcon
>          Issue Type: Sub-task
>            Reporter: Sowmya Ramesh
>            Assignee: Sowmya Ramesh
>              Labels: Recipe
>             Fix For: trunk
>         Attachments: ApacheFalcon-RecipeDesignDocument.V1.pdf, ApacheFalcon-RecipeDesignDocument.pdf
> Today Recipe cooking is a client side logic. Recipe also supports extensions i.e. user
can cook his/her own custom recipes.
> Decision to make it client side logic was for the following reasons
>   *   Keep it isolated from falcon server
>   *   As custom recipe cooking is supported, user recipes can introduce security vulnerabilities
and also can bring down the falcon server
> Today, falcon provides HDFS DR recipe out of the box. There is a plan to add UI support
for DR in Falcon.
> Rest API support cannot be added for recipe as it is client side processing.
> If the UI is pure java script[JS] then all the recipe cooking logic has to be repeated
in JS. This is not a feasible solution - if more recipes are added say DR for hive, hbase
and others, UI won't be extensible.
> For the above mentioned reasons Recipe should me made a server side logic.
> Provided/Trusted recipes [recipes provided out of the box]  can run as Falcon process.
Recipe cooking will be done in a new process if its custom recipe [user code].
> For cooking of custom recipes, design proposed should consider handling security implications,
handling the issues where the custom user code can bring down the Falcon server (trapping
System.exit), handling  class path isolation.
> Also it shouldn't in anyway destabilize the Falcon system.
> There are couple of approaches which was discussed
> *Approach 1:*
> Custom Recipe cooking can be carried out separately in another Oozie WF, this will ensure
isolation. Oozie already has the ability to schedule jobs as a user and handles all the security
aspects of it.
> Pros:
> - Provides isolation
> - Piggyback on Oozie as it already provides the required functionality
> Cons:
> - As recipe processing is done in different WF, from operations point of view user cannot
figure out recipe processing status and thus adds to the operational pain. Operational issue
with this approach is said to be the overall
> apparatus needed to monitor and manage the recipe-cooking workflows.  
> Oozie scheduling can bring arbitrary delays  Granted we can design around the limitations
and make use of the strengths of the approach but it seems something we can avoid if we can.
> - There has been few discussions to move away from Oozie as scheduling engine for Falcon.
If this is the plan going forward its good not to add new functionality using oozie.
> *Approach 2:*
> Custom recipe cooking is done on the server side in a separate independent process than
Falcon process I.e. It runs in a different JVM. Throttling should be added for how many recipe
cooking processes can be launched keeping in mind the machine configuration.
> Pros:
> - Provides isolation as recipe cooking is done in a independent process
> Cons:
> - Performance overhead as new process is launched for custom recipe cooking
> - Adds more complexity to the system
> This bug will be used to move recipe processing for trusted recipes to server side.

This message was sent by Atlassian JIRA

View raw message