airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gaurav Sehgal (Jira)" <>
Subject [jira] [Commented] (AIRFLOW-6227) Ability to assign multiple pool names to a single task
Date Sat, 21 Dec 2019 09:52:00 GMT


Gaurav Sehgal commented on AIRFLOW-6227:

[~ash] As I see, right now we are storing the pool information in the task instance itself.
But what if someone changed the task pool, in that case, all the old ti will run in the old
pool(while retrying), whereas the new ti will run in a new pool. Is it the right behavior?
Not sure, but isn't it should be like all ti map to new pool itself.

> Ability to assign multiple pool names to a single task
> ------------------------------------------------------
>                 Key: AIRFLOW-6227
>                 URL:
>             Project: Apache Airflow
>          Issue Type: New Feature
>          Components: scheduler
>    Affects Versions: 1.10.6
>            Reporter: t oo
>            Assignee: Gaurav Sehgal
>            Priority: Major
> Right now only a single pool name can be assigned to each task instance.
> Ideally 2 different pool names can be assigned to a task_instance.
> Use case:
> I have 300 Spark tasks writing to 60 different tables (ie. there are multiple tasks writing
to same table).
> I want both:
>  # Maximum of 30 Spark tasks running in parallel
>  # Never more than 1 Spark task writing to the same table in parallel
> If i have a 'spark' pool of 30 and assign 'spark' pool to those tasks then i risk having
2 tasks writing to same table.
> But instead if i have a 'tableA' pool of 1, 'tableB' pool of 1, 'tableC' pool of 1...etc and
assign relevant table name pool to each task then i risk having more than 30 spark tasks running
in parallel.
> I can't use 'parallelism' or other settings because I have other non-spark tasks that
I don't want to limit

This message was sent by Atlassian Jira

View raw message