airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-1916) S3 Task logs end up duplicated in the file.
Date Tue, 19 Dec 2017 20:05:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297347#comment-16297347
] 

ASF subversion and git services commented on AIRFLOW-1916:
----------------------------------------------------------

Commit 4ce4faaeae7a76d97defcf9a9d3304ac9d78b9bd in incubator-airflow's branch refs/heads/master
from [~ashb]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=4ce4faa ]

[AIRFLOW-1916] Don't upload logs to remote from `run --raw`

In a previous change we removed the
airflow.task.raw handler (which
printed to stdout directly) and replaced it with
one that wrote to the
log file itself. The problem comes that python
automatically calls
`logging.shutdown()` itself on process clean exit.
This ended up
uploading the log file twice: once from the end of
`airflow run --raw`,
and then again from the explicit shutdown() call
at the end of cli's
`run()`

Since logging is automatically shutdown this
change adds and explicit
flag to control if the GC and S3 handlers should
upload the file or not,
and we tell them not to when running with `--raw`

Closes #2880 from ashb/AIRFLOW-1916-dont-upload-
logs-twice


> S3 Task logs end up duplicated in the file.
> -------------------------------------------
>
>                 Key: AIRFLOW-1916
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1916
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Ash Berlin-Taylor
>             Fix For: 1.9.1, 2.0.0
>
>
> If using the S3TaskHandler logger the contents of the log file in the S3 bucket end up
duplicated - once from when `airflow run --raw` finalizes the task, and again from when `airflow
run --local` finalizes it's logger.
> Log from the UI included below. The file on disk does not have the repetition.
> There is a comment in `run()` in airflow.bin.cli implying that `--raw` is not meant to
upload, but something is.
> {noformat}
> *** Reading remote log from s3://xxx/ash-test/tests/test-logging/2017-12-13T10:45:42.552705/1.log.
> [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for <TaskInstance:
tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for <TaskInstance:
tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,906] {models.py:1407} INFO - 
> --------------------------------------------------------------------------------
> Starting attempt 1 of 1
> --------------------------------------------------------------------------------
> [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing <Task(PythonOperator):
test-logging> on 2017-12-13 10:45:42.552705
> [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', '-c', 'airflow
run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py']
> [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,622]
{configuration.py:206} WARNING - section/key [celery/celery_ssl_active] not found in config
> [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,622]
{default_celery.py:41} WARNING - Celery Executor will run without SSL
> [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,624]
{__init__.py:45} INFO - Using executor CeleryExecutor
> [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,754]
{models.py:189} INFO - Filling up the DagBag from /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello
> [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,904]
{python_operator.py:90} INFO - Done. Returned value was: None
> [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for <TaskInstance:
tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for <TaskInstance:
tests.test-logging 2017-12-13 10:45:42.552705 [queued]>
> [2017-12-13 10:45:49,906] {models.py:1407} INFO - 
> --------------------------------------------------------------------------------
> Starting attempt 1 of 1
> --------------------------------------------------------------------------------
> [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing <Task(PythonOperator):
test-logging> on 2017-12-13 10:45:42.552705
> [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', '-c', 'airflow
run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py']
> [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,622]
{configuration.py:206} WARNING - section/key [celery/celery_ssl_active] not found in config
> [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,622]
{default_celery.py:41} WARNING - Celery Executor will run without SSL
> [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,624]
{__init__.py:45} INFO - Using executor CeleryExecutor
> [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,754]
{models.py:189} INFO - Filling up the DagBag from /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d
> [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from /usr/local/airflow/dags/example/csv_to_parquet.py
> [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello
> [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,904]
{python_operator.py:90} INFO - Done. Returned value was: None
> [2017-12-13 10:45:54,923] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:54,922]
{connectionpool.py:735} INFO - Starting new HTTPS connection (1): sts.amazonaws.com
> [2017-12-13 10:45:55,498] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:55,498]
{connectionpool.py:735} INFO - Starting new HTTPS connection (1): s3.eu-west-1.amazonaws.com
> [2017-12-13 10:45:55,784] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:55,784]
{connectionpool.py:735} INFO - Starting new HTTPS connection (1): s3.eu-west-1.amazonaws.com
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message