airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pratap Naik (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (AIRFLOW-192) Implement priority_weight aggregation using ancestors (rather than successors)
Date Wed, 03 Jan 2018 06:34:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309206#comment-16309206
] 

Pratap Naik edited comment on AIRFLOW-192 at 1/3/18 6:33 AM:
-------------------------------------------------------------

I believe setting the priority to -1 for all tasks does the trick for the first requirement..


was (Author: pratapnaik):
I think setting the priority to -1 for all tasks does the trick...

> Implement priority_weight aggregation using ancestors (rather than successors)
> ------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-192
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-192
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>    Affects Versions: Airflow 1.7.1.2
>            Reporter: Sergei Iakhnin
>
> Currently tasks are being scheduled based on the priority_weight. The effective priority
of a task is it's own priority plus the priorities of all tasks that follow it in a dag. This
results in undesirable scheduling behaviour in my use case.
> My use case involves running scientific workflows where a number of operations are being
carried out on a set of samples in a set. Each sample is handled by a separate dag run that
is manually triggered. It is common for several thousand dag instances to be in flight at
a given time. The dag reserves a sample, operates on it, and then releases it. I would like
for each sample to be reserved for as short a time as possible, so that other programs can
have an opportunity to operate on it and dag runs can complete as fast as possible. However,
because of the current priority logic, if I were to schedule several thousand dags at a given
time, they would first all execute their first state, then all execute their second state,
etc. Thus, no dag can complete fully, until all dags complete their second last state. This
results in unnecessarily long dag run times and simultaneous completion of all dags.
> Ideally, Airflow would support the reverse of the current logic used for priorities i.e.
a task's priority is the sum of priorities of all its ancestors. This way, the further along
a dag is in its processing the more likely its tasks will get scheduled (thus leading to a
shorter completion time, and release of its resources).
> Also, a nominal priority mode would be useful, where a task's priority is exactly the
number given to it by the author, in order to allow more scheduling flexibility.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message