airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Bryński (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (AIRFLOW-401) scheduler gets stuck without a trace
Date Thu, 15 Sep 2016 18:29:20 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477516#comment-15477516
] 

Maciej Bryński edited comment on AIRFLOW-401 at 9/15/16 6:29 PM:
-----------------------------------------------------------------

I will try this.
In the meantime I found  not documented min_file_process_interval option.
UPDATE: the same was in the patch

That's solved many of my problems but trigger new.

How can I set up HA Scheduler ? Having more than one instance triggers duplicates of DagRuns.


was (Author: maver1ck):
I will try this.
In the meantime I found  not documented min_file_process_interval option.

That's solved many of my problems but trigger new.

How can I set up HA Scheduler ? Having more than one instance triggers duplicates of DagRuns.

> scheduler gets stuck without a trace
> ------------------------------------
>
>                 Key: AIRFLOW-401
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-401
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor, scheduler
>    Affects Versions: Airflow 1.7.1.3
>            Reporter: Nadeem Ahmed Nazeer
>            Assignee: Bolke de Bruin
>            Priority: Minor
>         Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU usage of
scheduler service is at 100%. No jobs get submitted and everything comes to a halt. Looks
it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the scheduler service.
But again, after running some tasks it gets stuck. I've tried with both Celery and Local executors
but same issue occurs. I am using the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message