airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Neiheisel (JIRA)" <>
Subject [jira] [Created] (AIRFLOW-366) SchedulerJob gets locked up when when child processes attempt to log to single file
Date Tue, 26 Jul 2016 19:59:20 GMT
Greg Neiheisel created AIRFLOW-366:

             Summary: SchedulerJob gets locked up when when child processes attempt to log
to single file
                 Key: AIRFLOW-366
             Project: Apache Airflow
          Issue Type: Bug
          Components: scheduler
            Reporter: Greg Neiheisel

After running the scheduler for a while (usually after 1 - 5 hours) it will eventually lock
up, and nothing will get scheduled.

A `SchedulerJob` will end up getting stuck in the `while` loop around line 730 of `airflow/`.

>From what I can tell this is related to logging from within a forked process using pythons
multiprocessing module.

The job will fork off some child processes to process the DAGs but one (or more) will end
up getting suck and not terminating, resulting in the while loop getting hung up.  You can
`kill -9 PID` the child process manually, and the loop will end and the scheduler will go
on it's way, until it happens again.

The issue is due to usage of the logging module from within the child processes.  From what
I can tell, logging to a file from multiple processes is not supported by the multiprocessing
module, but it is supported using python multithreading, using some sort of locking mechanism.

I think a child process will somehow inherit a logger that is locked, right when it is forked,
resulting it the process completely locking up.

I went in and commented out all the logging statements that could possibly be hit by the child
process (,, and was able to keep the scheduler alive.

This message was sent by Atlassian JIRA

View raw message