airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fokko Driesprong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-1562) Spark-sql deadlock in logging
Date Mon, 04 Sep 2017 19:04:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Fokko Driesprong updated AIRFLOW-1562:
--------------------------------------
    Description: 
Related to Issue 1255

Logging in SparkSqlOperator does not work as intended (continuous logging as received in the
subprocess). This is because, spark-sql internally redirects all logs to stdout (including
stderr), which causes the current two iterator logging to get stuck with empty stderr pipe.
Also this situation can lead to a deadlock because the std-err can grow too big and it will
start to block until it gets consumed, which will only happen when the process ends, so the
process stalls.

  was:
Related to Issue 1255

Logging in SparkSubmitOperator does not work as intended (continuous logging as received in
the subprocess). This is because, spark-submit internally redirects all logs to stdout (including
stderr), which causes the current two iterator logging to get stuck with empty stderr pipe.
The logs are written only when the subprocess finishes. This leads to yarn_application_id
not being available until the end of application.



> Spark-sql deadlock in logging
> -----------------------------
>
>                 Key: AIRFLOW-1562
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1562
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks
>    Affects Versions: Airflow 1.8
>            Reporter: Fokko Driesprong
>
> Related to Issue 1255
> Logging in SparkSqlOperator does not work as intended (continuous logging as received
in the subprocess). This is because, spark-sql internally redirects all logs to stdout (including
stderr), which causes the current two iterator logging to get stuck with empty stderr pipe.
Also this situation can lead to a deadlock because the std-err can grow too big and it will
start to block until it gets consumed, which will only happen when the process ends, so the
process stalls.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message