falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ajay Yadava (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1108) Custom recipe processing
Date Sat, 21 Mar 2015 11:24:38 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372684#comment-14372684

Ajay Yadava commented on FALCON-1108:

[~sriksun] I think I haven't completely understood your concern about resources. I think there
is a gap in our understanding of recipes. I think a design doc is missing here (there are
several discussions but no final design doc) and we should first finalise on that before starting
the implementations.

It will help me (and others) if you can share these concerns with some concrete examples by
explaining the flow for a recipe. I will try to crudely explain my understanding (based on
FALCON-634 & FALCON-636)

* Falcon server has a repository of recipes on HDFS.
* A recipe is a template of a Falcon process. There are delimiters to identify the variables
to be processed by Recipe Cooker. e.g. ##valueToBeReplaced##. 
* Recipe client requests Recipe Cooker to cook a recipe by giving a unique identifier(name/path),
and a given properties file. 
* Recipe Cooker cooks the recipe. Cooking is essentially replacing these variables in the
template file using the properties file provided and returns a falcon readable process definition
in xml format.
* Client takes that xml and asks falcon to submit and/or schedule the resulting process.

Let me know if this understanding is not correct. Based on this understanding I don't see
any class libraries/dependencies conflicts entering in the Recipe Cooker.  Execution of the
final process doesn't happen in falcon. Can you please provide an example where Recipe Cooker
will need to deal with conflicting libraries.

Similarly I fail to see the need for launching separate JVM to maintain memory / CPU isolation
for Recipe Cooker. Can you please provide an example to illustrate high memory consumption
/ CPU utilization.

Assuming there are serious memory and CPU constraints I fail to see how launching separate
JVMs and communicating between them is resource efficient in terms of CPU, memory and network
usage. On the other hand if I take the 95% use case of sane templates and no memory leaks.
How many JVMs will you be able to launch on a shared host hosting falcon. With 256MB for each
JVM, on a machine with 2GB memory I will be able to launch only 9 JVMs. Each JVM will take
some extra resources for itself apart from just processing the recipe. How is this more efficient
and maintainable? Will it not affect falcon server.

Also, just to clarify I am proposing a separate Recipe Server. It can be deployed on a separate
box and will not affect Falcon Server under any circumstances. We can't guarantee this even
with launching separate JVMs from falcon server as long as they are on same machine. 

> Custom recipe processing
> ------------------------
>                 Key: FALCON-1108
>                 URL: https://issues.apache.org/jira/browse/FALCON-1108
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.6
>            Reporter: Sowmya Ramesh
>              Labels: Recipe
>             Fix For: 0.7
> Custom recipe cooking to be done on the server side in a separate independent process
than Falcon process I.e. It runs in a different JVM. For more details refer [FALCON-1107|https://issues.apache.org/jira/browse/FALCON-1107]

This message was sent by Atlassian JIRA

View raw message