airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "t oo (Jira)" <>
Subject [jira] [Created] (AIRFLOW-6228) Ability for a single task to consume more than 1 slot of a pool
Date Wed, 11 Dec 2019 09:31:00 GMT
t oo created AIRFLOW-6228:

             Summary: Ability for a single task to consume more than 1 slot of a pool
                 Key: AIRFLOW-6228
             Project: Apache Airflow
          Issue Type: New Feature
          Components: scheduler
    Affects Versions: 1.10.6
            Reporter: t oo

Right now only a single pool name can be assigned to each task instance.

Ideally 2 different pool names can be assigned to a task_instance.

Use case:

I have 300 Spark tasks writing to 60 different tables (ie. there are multiple tasks writing
to same table).

I want both:
 # Maximum of 30 Spark tasks running in parallel
 # Never more than 1 Spark task writing to the same table in parallel

If i have a 'spark' pool of 30 and assign 'spark' pool to those tasks then i risk having 2
tasks writing to same table.

But instead if i have a 'tableA' pool of 1, 'tableB' pool of 1, 'tableC' pool of 1...etc and
assign relevant table name pool to each task then i risk having more than 30 spark tasks running
in parallel.

I can't use 'parallelism' or other settings because I have other non-spark tasks that I don't
want to limit



This message was sent by Atlassian Jira

View raw message