falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shwetha G S (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FALCON-384) Support oozie coord reruns based on argument in Falcon
Date Wed, 01 Oct 2014 04:04:33 GMT

    [ https://issues.apache.org/jira/browse/FALCON-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154314#comment-14154314
] 

Shwetha G S edited comment on FALCON-384 at 10/1/14 4:03 AM:
-------------------------------------------------------------

With coord action re-runs, oozie always launches new workflow. So, there can be multiple workflows
for the same coord instance. This provides multiple points for re-runs. If someone re-runs
these workflows in parallel, we will end up with multiple hadoop jobs for the same instance
and cause data discrepancies. We are trying to address this with OOZIE-1536 


was (Author: shwethags):
With coord action re-runs, oozie always launches new workflow. So, there can be multiple workflows
for the same coord instance. This provides multiple points for re-runs. If someones re-runs
these workflows in parallel, we will end up with multiple hadoop jobs for the same instance
and cause data discrepancies. We are trying to address this with OOZIE-1536 

> Support oozie coord reruns based on argument in Falcon
> ------------------------------------------------------
>
>                 Key: FALCON-384
>                 URL: https://issues.apache.org/jira/browse/FALCON-384
>             Project: Falcon
>          Issue Type: Bug
>          Components: oozie
>            Reporter: Shaik Idris Ali
>            Assignee: Shaik Idris Ali
>
> Currently, when falcon instance is rerun, we check if the workflow for given instance
is present and just use oozie's workflow rerun.
> However users might need coord reruns instead of workflow reruns,
> 1. Users may want to change job conf (hadoop conf) etc, with workflow reruns, exactly
the same hadoop job is retried, but with coord rerun a new workflow instance is created with
confs.
> 2. Users might use EL expressions like latest(0), which should be resolved again with
new set of datasets.
> 3. Users might also need to re-evaluate dependencies in cases we have chain of jobs.
With just workflow rerun the job simply kicks-off without checking for data.
> Provide option like refresh and invoke coord rerun with refresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message