falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pallavi Rao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1385) Rerun for pipeline
Date Mon, 10 Aug 2015 10:50:45 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679935#comment-14679935

Pallavi Rao commented on FALCON-1385:

[~nperiwal], I haven't gone through the patch in detail, but, couple of things I noticed.

1. One important consideration when we support pipeline re-run is the order. What I mean is,
lets say, there is a pipeline as follows : P1 -> F1 -> P2 -> F3, where P2 is dependent
on the output of P1. In such cases, rerun should run P1 first and once it completes, it should
run P2 (for a given instance graph). Are we even considering such dependencies? If we blindly
run P1 and P2 in parallel, the user might not see the results he expects.

2. I see we are using co-ord action rerun. Oozie team generally encourages workflow rerun
because it re-uses the workflow id and hence provides lineage. Can you check with [~jaydeepvishwakarma]
for details and recommendation?

> Rerun for pipeline
> ------------------
>                 Key: FALCON-1385
>                 URL: https://issues.apache.org/jira/browse/FALCON-1385
>             Project: Falcon
>          Issue Type: Sub-task
>            Reporter: Narayan Periwal
>            Assignee: Narayan Periwal
>         Attachments: FALCON-1385-v0.patch
> In case of backlogs, or cluster issues, we need to rerun all failed instances in an entire
pipeline. Currently falcon supports rerun only on one entity or instances of an entity. An
ability to filter desired several entities or instances from several entities and rerun them
in one command will add lot of value to falcon.
> Users should be able to use the following criteria to filter the instances which need
to be rerun.
> 1) start date and end date
> 2) name 
> 3) tag key=value
> 4) pipeline (valid only in case of processes)
> 5) Status (should accept multiple values for status)
> Need to pay attention on partial success / recovery of such operations.

This message was sent by Atlassian JIRA

View raw message