airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison Wang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (AIRFLOW-1667) Remote log handlers don't upload logs
Date Sat, 07 Oct 2017 05:26:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195581#comment-16195581
] 

Allison Wang edited comment on AIRFLOW-1667 at 10/7/17 5:25 AM:
----------------------------------------------------------------

I agree that we shouldn't rely on the logging module's close to upload the log since we have
no control when it's called. Instead of calling close, we could explicitly add a post_task_run
method in the handler that handles any additional clean up/operations upon task completion.
This change only requires modifying a small amount of current code. I am not exactly sure
how the to upload the log to remote storage like S3/GCS periodically upon task execution,
but it's possible to use a log collector (e.g Filebeat) to ship the log to a centralized storage
(e.g ElasticSearch) in real time. 


was (Author: allisonwang):
I agree that we shouldn't rely on the logging module's close to upload the log since we have
no control when it's called. Instead of calling close, we could explicitly invoke a post_task_run
method in handlers that handles any additional clean up/operations upon task completion. This
change only requires modifying a small amount of current code. I am not exactly sure how the
to upload the log to remote storage like S3/GCS periodically upon task execution, but it's
possible to use a log collector (e.g Filebeat) to ship the log to a centralized storage (e.g
ElasticSearch) in real time. 

> Remote log handlers don't upload logs
> -------------------------------------
>
>                 Key: AIRFLOW-1667
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1667
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: logging
>    Affects Versions: 1.9.0, 1.10.0
>            Reporter: Arthur Vigil
>
> AIRFLOW-1385 revised logging for configurability, but the provided remote log handlers
(S3TaskHandler and GCSTaskHandler) only upload on close (flush is left at the default implementation
provided by `logging.FileHandler`). A handler will be closed on process exit by `logging.shutdown()`,
but depending on the Executor used worker processes may not regularly shutdown, and can very
likely persist between tasks. This means during normal execution log files are never uploaded.
> Need to find a way to flush remote log handlers in a timely manner, but without hitting
the target resources unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message