airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Isaac Steele (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-323) Should be able to prevent tasks from overlapping across multiple DAG Runs
Date Mon, 11 Jul 2016 18:38:10 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371376#comment-15371376
] 

Isaac Steele commented on AIRFLOW-323:
--------------------------------------

Hey [~artwr]. Thanks for the reply. My use case is for replicating data, for the most part,
the replication tasks are very short, however if there is a schema change, it takes longer,
but we want all the data to continue rolling in, regardless of whether 1 table has to take
longer because of extra data or schema changes.  The depends_on_past doesn't work as-is because
it queues up all of the runs. If one task takes an hour, but the schedule is set to 15 minutes,
then there will be 4 tasks that will have to run before it catches itself up, and then more,
should those 4 take more than 15 minutes total. What we want is for a task to just be marked
as State.SKIPPED, and just let the longer-running task complete.

I've added a parameter to do this in my fork, and it works very well, but need to write unit
tests around it before I can submit a PR. 

We want multiple DAGs to be running at the same time, that's not the issue, just individual
tasks to not overlap should they happen to not finish in time. (Also, we don't want an entire
DAG Run to be held up, if 1 task is delayed.)

The "resources" comment in my original post wasn't really a thing we were running into was
just thinking it could be a thing for someone else, it was just the overlapping and queuing
of tasks that we were having problems with in our project.

(Note that I've also parameterized allowing failed states to work with depends_on_past for
our own use case as well [AIRFLOW-324], but again need to complete my tests before submitting
the PR.)

> Should be able to prevent tasks from overlapping across multiple DAG Runs
> -------------------------------------------------------------------------
>
>                 Key: AIRFLOW-323
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-323
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: Airflow 1.7.1.2
>         Environment: 1.7.1.2
>            Reporter: Isaac Steele
>            Assignee: Isaac Steele
>
> As a the Airflow administrator,
> If a task from a previous DAG Run is still running when the next scheduled run triggers
the same task, there should be a way prevent the tasks from overlapping.
> Otherwise the same code could end up running multiple times simultaneously.
> To reproduce:
> 1) Create a DAG with a short scheduled interval
> 2) Create a task in that DAG to run longer than the interval
> Result: Both tasks end up running that the same time.
> This can cause tasks to compete for resources as well as duplicating or overwriting what
the other task is doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message