airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Panzhin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (AIRFLOW-764) max_active_runs_per_dag not respected for DAGs triggered manually within a few seconds of one another
Date Fri, 03 Feb 2017 19:17:51 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851975#comment-15851975
] 

Alexander Panzhin edited comment on AIRFLOW-764 at 2/3/17 7:17 PM:
-------------------------------------------------------------------

This is a major problem when migrating to Airfllow. It's impossible to backfill manually DAGs
and not manage them manually.
This bug degrades the value of airflow down to a CRON with a nice UI. What's the point of
a workflow management tool then?


was (Author: jalexoid):
This is a major problem when migrating to Airfllow. It's impossible to backfill manually DAGs
and not manage them manually - what's the point of a workflow management tool then?

> max_active_runs_per_dag not respected for DAGs triggered manually within a few seconds
of one another
> -----------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-764
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-764
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: core, executor
>    Affects Versions: Airflow 1.7.1.3
>         Environment: debian linux, mysql with localexecutor
>            Reporter: Jeffrey Enns
>         Attachments: test_dag.py, test_dag_screen.png, test_job.sh, trigger_two.sh
>
>
> Given the following configuration:
> ```
> [core]
> executor = LocalExecutor
> max_active_runs_per_dag = 1
> parallelism = 20
> dag_concurrency = 1
> ```
> Even with `max_active_runs_per_dag=1`, it is possible to cause two (or more) DAG runs
to run in parallel by triggering the runs manually within a few seconds/milliseconds of one
another. Task Instances from the distinct DAG runs will show as active in the “Task Instances”
web view at the same time.
> I only looked at the scheduler code briefly, but it looked as if a race condition would
be possible for manually triggered DAGs that could lead to this behaviour.
> I’ve attached a test DAG and two shell scripts I used to reliably reproduce this behaviour.
Put `test_dag.py` and `test_job.sh` in the DAGs folder, and then run `trigger_two.sh` to reproduce
the bug. 
> Also attached is a screenshot showing DAG runs (for the dag ‘race_dag’) running in
parallel after following the steps described immediately above (note the execution date, start
date, and end date for each TI).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message