airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joao Trindade (Jira)" <j...@apache.org>
Subject [jira] [Comment Edited] (AIRFLOW-5506) Airflow scheduler stuck
Date Mon, 07 Oct 2019 09:49:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945741#comment-16945741
] 

Joao Trindade edited comment on AIRFLOW-5506 at 10/7/19 9:48 AM:
-----------------------------------------------------------------

We've seen this issue on 1.10.4, when the ETL was stopped for a couple of hours. When restarted,
we had around 30 dags creating a total of 120 dag runs at the same time. 

When we turned off some of the dags in the UI, even though their dag_runs were still  in
"running status" the other dags started getting tasks scheduled


was (Author: jftrindade):
We've seen this issue on 1.10.4

We manage to get it running by turning off some of the dags on the UI.

> Airflow scheduler stuck
> -----------------------
>
>                 Key: AIRFLOW-5506
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5506
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.10.4, 1.10.5
>            Reporter: t oo
>            Priority: Major
>
> re-post of [https://stackoverflow.com/questions/57713394/airflow-scheduler-stuck] and
slack discussion
>  
>  
> I'm testing the use of Airflow, and after triggering a (seemingly) large number of DAGs
at the same time, it seems to just fail to schedule anything and starts killing processes.
These are the logs the scheduler prints:
> {{[2019-08-29 11:17:13,542] \{scheduler_job.py:214} WARNING - Killing PID 199809
> [2019-08-29 11:17:13,544] \{scheduler_job.py:214} WARNING - Killing PID 199809
> [2019-08-29 11:17:44,614] \{scheduler_job.py:214} WARNING - Killing PID 2992
> [2019-08-29 11:17:44,614] \{scheduler_job.py:214} WARNING - Killing PID 2992
> [2019-08-29 11:18:15,692] \{scheduler_job.py:214} WARNING - Killing PID 5174
> [2019-08-29 11:18:15,693] \{scheduler_job.py:214} WARNING - Killing PID 5174
> [2019-08-29 11:18:46,765] \{scheduler_job.py:214} WARNING - Killing PID 22410
> [2019-08-29 11:18:46,766] \{scheduler_job.py:214} WARNING - Killing PID 22410
> [2019-08-29 11:19:17,845] \{scheduler_job.py:214} WARNING - Killing PID 42177
> [2019-08-29 11:19:17,846] \{scheduler_job.py:214} WARNING - Killing PID 42177
> ...}}
> I'm using a LocalExecutor with a PostgreSQL backend DB. It seems to be happening only
after I'm triggering a large number (>100) of DAGs at about the same time using external
triggering. As in:
> {{airflow trigger_dag DAG_NAME}}
> After waiting for it to finish killing whatever processes he is killing, he starts executing
all of the tasks properly. I don't even know what these processes were, as I can't really
see them after they are killed...
> Did anyone encounter this kind of behavior? Any idea why would that happen?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message