airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matti Remes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-137) Airflow does not respect 'max_active_runs' when task from multiple dag runs cleared
Date Fri, 26 May 2017 09:39:05 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026063#comment-16026063
] 

Matti Remes commented on AIRFLOW-137:
-------------------------------------

Backfill respects airflow.cfg's `max_active_runs` if the dag run count is below max_active_runs.
If dag count is less than max_active_runs initially, it will spawn all the dag runs given
by the backfill at once. Otherwise backfill will exit with a log message `Dag dag has reached
maximum amount of 10 dag runs`.

I think the expected behaviour should be to schedule new dag runs till the limit of max_active_runs
is reached and poll the count continuously and add a new dag run when the active count is
less than the max value.

> Airflow does not respect 'max_active_runs' when task from multiple dag runs cleared
> -----------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-137
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-137
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Tomasz Bartczak
>            Priority: Minor
>
> Also requested at https://github.com/apache/incubator-airflow/issues/1442
> Dear Airflow Maintainers,
> Environment
> Before I tell you about my issue, let me describe my Airflow environment:
> Please fill out any appropriate fields:
>     Airflow version: 1.7.0
>     Airflow components: webserver, mysql, scheduler with celery executor
>     Python Version: 2.7.6
>     Operating System: Linux Ubuntu 3.19.0-26-generic Scheduler runs with --num-runs and
get restarted around every minute or so
> Description of Issue
> Now that you know a little about me, let me tell you about the issue I am having:
>     What did you expect to happen?
>     After running 'airflow clear -t spark_final_observations2csv -s 2016-04-07T01:00:00
-e 2016-04-11T01:00:00 MODELLING_V6' I expected that this task gets executed in all dag-runs
in specified by given time-range - respecting 'max_active_runs'
>     Dag configuration:
>     concurrency= 3,
>     max_active_runs = 2,
>     What happened instead?
>     Airflow at first started executing 3 of those tasks, which already violates 'max_active_runs',
but it looks like 'concurrency' was the applied limit here.
>     3_running_2_pending
> After first task was done - airflow scheduled all other tasks, making it 5 running dags
at the same time that violates all specified limit.
> In the GUI we saw red warning (5/2 Dags running ;-) )
> Reproducing the Issue
> max_active_runs is respected in a day-to-day basis - when of the tasks was stuck - airflow
didn't start more than 2 dags concurrently.
> [screenshots in the original issue: https://github.com/apache/incubator-airflow/issues/1442]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message