airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gerard Toonstra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-41) SubdagOperators can oversubscribe to pools due to race condition
Date Mon, 07 Nov 2016 20:15:58 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645294#comment-15645294
] 

Gerard Toonstra commented on AIRFLOW-41:
----------------------------------------

Whoops. 

Reason that I didn't go for a separate pool handler now is that it's a lot more complicated
and would add more state to the scheduler, thus leading 
to more complexity. What we need to do is decide whether we do want queued task instances
in a pool, relying on a (parallel) check (race conditions!) if the task really 
should run, which is also susceptible to task instances bouncing between scheduler and executor
in the case of heavy worker congestion, which leads to performance
degradation due to these bouncing tasks querying the database a lot, or whether we prefer
proper pooling constraints that are never violated. 

> SubdagOperators can oversubscribe to pools due to race condition
> ----------------------------------------------------------------
>
>                 Key: AIRFLOW-41
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-41
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler, subdag
>    Affects Versions: Airflow 1.7.1
>            Reporter: Bolke de Bruin
>
> SubDagOperators essentially create their own mini-scheduler. Which can interfere with
the main scheduler. 
> SubdagOperators check if there is slot available in a Pool. However this slot is not
claimed at the same time leaving room for main scheduler to also check for the slot. Both
can then obtain a slot and thus oversubscribe
> A solution could be a centralized PoolHandler that gives out slots



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message