airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-584) Airflow Pool does not limit running tasks
Date Thu, 20 Oct 2016 20:24:58 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David updated AIRFLOW-584:
--------------------------
    Description: 
Airflow pools are not limiting the number of running task instances for the following dag
in 1.7.1.3

Steps to recreate:
Create a pool of size 5 through the UI.

The following dag has 52 tasks with increasing priority corresponding to the task number.
There should only ever be 5 tasks running at a time however I observed 29 'used slots' in
a pool with 5 slots

{code}
dag_name = 'pools_bug'

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2016, 10, 20),
    'email_on_failure': False,
    'retries': 1
}

dag = DAG(dag_name, default_args=default_args, schedule_interval="0 8 * * *")
start = DummyOperator(task_id='start', dag=dag)
end = DummyOperator(task_id='end', dag=dag)

for i in range(50):
    sleep_command = 'sleep 10'
    task_name = 'task-{}'.format(i)
    op = BashOperator(
        task_id=task_name,
        bash_command=sleep_command,
        execution_timeout=timedelta(hours=4),
        priority_weight=i,
        pool=dag_name,
        dag=dag)

    start.set_downstream(op)
    end.set_upstream(op)
{code}

Relevant configurations from airflow.cfg:
{code}
# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 64

# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 64

# The maximum number of active DAG runs per DAG
max_active_runs_per_dag = 1
{code}

!!
!!

  was:
Airflow pools are not limiting the number of running task instances for the following dag
in 1.7.1.3

Steps to recreate:
Create a pool of size 5 through the UI.

The following dag has 52 tasks with increasing priority corresponding to the task number.
There should only ever be 5 tasks running at a time however I observed 29 'used slots' in
a pool with 5 slots

{code}
dag_name = 'pools_bug'

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2016, 10, 20),
    'email_on_failure': False,
    'retries': 1
}

dag = DAG(dag_name, default_args=default_args, schedule_interval="0 8 * * *")
start = DummyOperator(task_id='start', dag=dag)
end = DummyOperator(task_id='end', dag=dag)

for i in range(50):
    sleep_command = 'sleep 10'
    task_name = 'task-{}'.format(i)
    op = BashOperator(
        task_id=task_name,
        bash_command=sleep_command,
        execution_timeout=timedelta(hours=4),
        priority_weight=i,
        pool=dag_name,
        dag=dag)

    start.set_downstream(op)
    end.set_upstream(op)
{code}

Relevant configurations from airflow.cfg:
{code}
# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 64

# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 64

# The maximum number of active DAG runs per DAG
max_active_runs_per_dag = 1
{code}




> Airflow Pool does not limit running tasks
> -----------------------------------------
>
>                 Key: AIRFLOW-584
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-584
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: pools
>    Affects Versions: Airflow 1.7.1.3
>         Environment: Ubuntu 14.04
>            Reporter: David
>         Attachments: Pasted image at 2016_10_20 09_49 AM.png, Screen Shot 2016-10-20
at 4.24.48 PM.png
>
>
> Airflow pools are not limiting the number of running task instances for the following
dag in 1.7.1.3
> Steps to recreate:
> Create a pool of size 5 through the UI.
> The following dag has 52 tasks with increasing priority corresponding to the task number.
There should only ever be 5 tasks running at a time however I observed 29 'used slots' in
a pool with 5 slots
> {code}
> dag_name = 'pools_bug'
> default_args = {
>     'owner': 'airflow',
>     'depends_on_past': False,
>     'start_date': datetime(2016, 10, 20),
>     'email_on_failure': False,
>     'retries': 1
> }
> dag = DAG(dag_name, default_args=default_args, schedule_interval="0 8 * * *")
> start = DummyOperator(task_id='start', dag=dag)
> end = DummyOperator(task_id='end', dag=dag)
> for i in range(50):
>     sleep_command = 'sleep 10'
>     task_name = 'task-{}'.format(i)
>     op = BashOperator(
>         task_id=task_name,
>         bash_command=sleep_command,
>         execution_timeout=timedelta(hours=4),
>         priority_weight=i,
>         pool=dag_name,
>         dag=dag)
>     start.set_downstream(op)
>     end.set_upstream(op)
> {code}
> Relevant configurations from airflow.cfg:
> {code}
> # The amount of parallelism as a setting to the executor. This defines
> # the max number of task instances that should run simultaneously
> # on this airflow installation
> parallelism = 64
> # The number of task instances allowed to run concurrently by the scheduler
> dag_concurrency = 64
> # The maximum number of active DAG runs per DAG
> max_active_runs_per_dag = 1
> {code}
> !!
> !!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message