airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gaurav Sehgal (Jira)" <j...@apache.org>
Subject [jira] [Assigned] (AIRFLOW-6227) Ability to assign multiple pool names to a single task
Date Mon, 16 Dec 2019 20:24:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gaurav Sehgal reassigned AIRFLOW-6227:
--------------------------------------

    Assignee: Gaurav Sehgal

> Ability to assign multiple pool names to a single task
> ------------------------------------------------------
>
>                 Key: AIRFLOW-6227
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6227
>             Project: Apache Airflow
>          Issue Type: New Feature
>          Components: scheduler
>    Affects Versions: 1.10.6
>            Reporter: t oo
>            Assignee: Gaurav Sehgal
>            Priority: Major
>
> Right now only a single pool name can be assigned to each task instance.
> Ideally 2 different pool names can be assigned to a task_instance.
> Use case:
> I have 300 Spark tasks writing to 60 different tables (ie. there are multiple tasks writing
to same table).
> I want both:
>  # Maximum of 30 Spark tasks running in parallel
>  # Never more than 1 Spark task writing to the same table in parallel
> If i have a 'spark' pool of 30 and assign 'spark' pool to those tasks then i risk having
2 tasks writing to same table.
> But instead if i have a 'tableA' pool of 1, 'tableB' pool of 1, 'tableC' pool of 1...etc and
assign relevant table name pool to each task then i risk having more than 30 spark tasks running
in parallel.
> I can't use 'parallelism' or other settings because I have other non-spark tasks that
I don't want to limit
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message