airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rolf Schroeder (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (AIRFLOW-2683) priority_weights honored by LocalExecutor but not by CeleryExecutor
Date Wed, 05 Dec 2018 11:41:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rolf Schroeder closed AIRFLOW-2683.
-----------------------------------
    Resolution: Invalid

> priority_weights honored by LocalExecutor but not by CeleryExecutor
> -------------------------------------------------------------------
>
>                 Key: AIRFLOW-2683
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2683
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: celery, scheduler
>    Affects Versions: 1.7.1.3
>         Environment: Linux
>            Reporter: Rolf Schroeder
>            Priority: Major
>         Attachments: airflow_priorities_celery.png, airflow_priorities_dag.png, airflow_priorities_localexecutor.png,
airflow_priorities_purge_in_the_middle.png, test_priority.py
>
>
> Hi,
> I am running Airflow 1.7.1.3 with Celery 4.0.2 (and rabbitmq). I have come across the
following issue: It seems to me that task priorities are not honored as expected when scheduling
tasks via Celery. However, when using the LocalExecutor, the priorities seem to work fine.
This issue arises when low prio tasks get scheduled/queued before high prio tasks. Celery
will finish all previously scheduled/queued low prio tasks before tackling the high prio ones.
In contrast, the LocalExecutor does not further process remaining low prio tasks and starts
the high prio tasks. I fee like once Airflow has sent the tasks to Celery, it "looses" control
of the execution order. Details below. I searched in Jira and possibly the following issues
are linked (although I am not convinced)
> (AIRFLOW-1510)https://issues.apache.org/jira/browse/AIRFLOW-1510
> (AIRFLOW-584)https://issues.apache.org/jira/browse/AIRFLOW-584
> I have the following test setup: Two independent initial tasks, each with 3 downstream
tasks. One of the initial tasks takes longer but has high prio downstream tasks. The other
initial task has a short duration and low prio downstream tasks. Both initial tasks start
at the same time. My expectation is that some of the low prio tasks get executed first (since
their initial upstream task is faster) but the moment the slower initial taks is done, its
high prio downstream tasks should get executed before the low prio downstream tasks.
> Here is a picture of the setup (forget about init0, this is just to make the DAG look
correct):
>  !airflow_priorities_dag.png!
> I've been trying this with a LocalExecutor (AIRFLOW__CORE__PARALLELISM=2) and two celery
workers. When running the LocalExecutor, the high prio tasks get indeed executed once their
init task is done (as expected, the 'prio100' tasks finish before the remaining 'prio10' tasks):
> !airflow_priorities_localexecutor.png!
> However, when using Celery, it seems that that the priorities are not honored anymore.
Once can see clearly that the low prio tasks (prio10) are all executed before the high prio
ones (prio100)
> !airflow_priorities_celery.png!
> Here is the corresponding code
> [^test_priority.py]
> I do not know whether this expected behavior or a bug? Is there any way to "fix" this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message