airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Airflow stops reading stdout of forked process with BashOperator
Date Tue, 03 Oct 2017 05:55:13 GMT
Probably a buffer is full or not emptied in time (as you mentioned). Ie. If we’re reading
from stderr but the stdout is full it gets stuck. This was fixed for the SparkOperators but
we might need to do the same here.

Bolke

Verstuurd vanaf mijn iPad

> Op 3 okt. 2017 om 03:02 heeft David Capwell <dcapwell@gmail.com> het volgende geschreven:
> 
> We use the bash operator to call a Java command line. We notice that some
> times the task stays running a long time (never stops) and that the logs in
> airflow stop getting updated for the task. After debugging a bit it turns
> out that the jvm is blocked on the stdout FD since the buffer is full. I
> manually cleaned the buffer (just called cat to dump the buffer) and see
> the jvm halts cleanly but the task stays stuck in airflow; airflow run is
> still running but the forked process is not
> 
> Walking the code in bash_operator I see that airflow creates a shell script
> than has bash run it. I see in the logs the location of the script but I
> don't see it on the file system. I didn't check when the process was hung
> so dont know if bash was running or not.
> 
> We have seen this a few times. Any idea what's going on? New to debugging
> Python and ptrace is disabled in our env so can't find a way to get the
> state of the airflow run command.
> 
> Thanks for any help!
> 
> Airflow version: 1.8.0 and 1.8.2 (above was on 1.8.2 but we see this on
> 1.8.0 cluster as well)

Mime
View raw message