falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkatesh Seetharam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-30) Enable embedding pig scripts directly in a process
Date Fri, 12 Jul 2013 20:11:49 GMT

    [ https://issues.apache.org/jira/browse/FALCON-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707319#comment-13707319
] 

Venkatesh Seetharam commented on FALCON-30:
-------------------------------------------

Again, thanks [~shwethags] for taking time to review the patch. My comments are below:

bq. 1. In OozieProcessMapper, we shouldn't do prepare delete as there can be usecases where
users don't want to do prepare delete(may be for incremental processing or some other random
usecase).
For simple use cases, pig launcher fails if the output directory exists and hence thought
this should be a pre-condition for this engine. If users want a different behavior, they could
use a oozie workflow. Makes sense? 

bq. 2. In OozieProcessMapper.addInputOutputFeedsAsParams(), use ${wf:conf('<param>')}
instead of ${<param>}. Later one doesn't work if param has '.'
Agreed. Did not know the limitation.

bq. 3. Feed properties go into just replication and retention wfs. Process properties go into
process parent wf. This is because we didn't see any usecase where feed properties are required
in process. Pig wf doesn't follow this. If you see a usecase for having feed properties in
process, please add it in parent wf for oozie action as well so that its consistent.
I do not foresee any specific use case. Will follow the recommendation of not propagating
feed properties.

bq. 4. Feed/process properties are added to conf and will not be available to pig scripts.
Should these be added as params as well? For example, if you want to pass currentHour as param
to pig script, how will you do it?
Good question. What if the user wants to send overrides to JT for priority, queue, etc. I
thought it'd be best to leave this in config as is done in feed as well. I'm hoping pig can
see these from the conf. Had an offline conversation with [~sriksun] and he did agree as well.

bq. 5. Process lib path is a directory. Will adding the directory as archive, add the files
in it to hadoop distributed cache?
I was under the impression that DistributedCache adds files under a dir and verified by looking
at the code that it does not. Will enumerate the files under the dir. Good catch!

Thanks!
                
> Enable embedding pig scripts directly in a process
> --------------------------------------------------
>
>                 Key: FALCON-30
>                 URL: https://issues.apache.org/jira/browse/FALCON-30
>             Project: Falcon
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>             Fix For: 0.3
>
>         Attachments: FALCON-30.patch, FALCON-30.r2.patch, FALCON-30.rev.patch
>
>
> Falcon allows users to express processing as a oozie workflow. This will enable users
to embed pig or hive scripts with out having to express them in a oozie workflow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message