Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EB977200D6C for ; Tue, 19 Dec 2017 21:05:08 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id EA212160BFF; Tue, 19 Dec 2017 20:05:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1CCA7160C1B for ; Tue, 19 Dec 2017 21:05:07 +0100 (CET) Received: (qmail 36221 invoked by uid 500); 19 Dec 2017 20:05:07 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 36208 invoked by uid 99); 19 Dec 2017 20:05:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Dec 2017 20:05:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B2A02180117 for ; Tue, 19 Dec 2017 20:05:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id MbMI1LPrJdVP for ; Tue, 19 Dec 2017 20:05:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 66AD55F2EE for ; Tue, 19 Dec 2017 20:05:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0F03CE20C5 for ; Tue, 19 Dec 2017 20:05:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 67D20240EE for ; Tue, 19 Dec 2017 20:05:00 +0000 (UTC) Date: Tue, 19 Dec 2017 20:05:00 +0000 (UTC) From: "ASF subversion and git services (JIRA)" To: commits@airflow.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AIRFLOW-1916) S3 Task logs end up duplicated in the file. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 19 Dec 2017 20:05:09 -0000 [ https://issues.apache.org/jira/browse/AIRFLOW-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297347#comment-16297347 ] ASF subversion and git services commented on AIRFLOW-1916: ---------------------------------------------------------- Commit 4ce4faaeae7a76d97defcf9a9d3304ac9d78b9bd in incubator-airflow's branch refs/heads/master from [~ashb] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=4ce4faa ] [AIRFLOW-1916] Don't upload logs to remote from `run --raw` In a previous change we removed the airflow.task.raw handler (which printed to stdout directly) and replaced it with one that wrote to the log file itself. The problem comes that python automatically calls `logging.shutdown()` itself on process clean exit. This ended up uploading the log file twice: once from the end of `airflow run --raw`, and then again from the explicit shutdown() call at the end of cli's `run()` Since logging is automatically shutdown this change adds and explicit flag to control if the GC and S3 handlers should upload the file or not, and we tell them not to when running with `--raw` Closes #2880 from ashb/AIRFLOW-1916-dont-upload- logs-twice > S3 Task logs end up duplicated in the file. > ------------------------------------------- > > Key: AIRFLOW-1916 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1916 > Project: Apache Airflow > Issue Type: Bug > Affects Versions: 1.9.0 > Reporter: Ash Berlin-Taylor > Fix For: 1.9.1, 2.0.0 > > > If using the S3TaskHandler logger the contents of the log file in the S3 bucket end up duplicated - once from when `airflow run --raw` finalizes the task, and again from when `airflow run --local` finalizes it's logger. > Log from the UI included below. The file on disk does not have the repetition. > There is a comment in `run()` in airflow.bin.cli implying that `--raw` is not meant to upload, but something is. > {noformat} > *** Reading remote log from s3://xxx/ash-test/tests/test-logging/2017-12-13T10:45:42.552705/1.log. > [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d > [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for > [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for > [2017-12-13 10:45:49,906] {models.py:1407} INFO - > -------------------------------------------------------------------------------- > Starting attempt 1 of 1 > -------------------------------------------------------------------------------- > [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing on 2017-12-13 10:45:42.552705 > [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', '-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py'] > [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,622] {configuration.py:206} WARNING - section/key [celery/celery_ssl_active] not found in config > [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,622] {default_celery.py:41} WARNING - Celery Executor will run without SSL > [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,624] {__init__.py:45} INFO - Using executor CeleryExecutor > [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,754] {models.py:189} INFO - Filling up the DagBag from /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d > [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello > [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value was: None > [2017-12-13 10:45:49,764] {cli.py:374} INFO - Running on host ac5d0787084d > [2017-12-13 10:45:49,895] {models.py:1197} INFO - Dependencies all met for > [2017-12-13 10:45:49,905] {models.py:1197} INFO - Dependencies all met for > [2017-12-13 10:45:49,906] {models.py:1407} INFO - > -------------------------------------------------------------------------------- > Starting attempt 1 of 1 > -------------------------------------------------------------------------------- > [2017-12-13 10:45:49,923] {models.py:1428} INFO - Executing on 2017-12-13 10:45:42.552705 > [2017-12-13 10:45:49,924] {base_task_runner.py:115} INFO - Running: ['bash', '-c', 'airflow run tests test-logging 2017-12-13T10:45:42.552705 --job_id 5 --raw -sd /usr/local/airflow/dags/example/csv_to_parquet.py'] > [2017-12-13 10:45:53,622] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,622] {configuration.py:206} WARNING - section/key [celery/celery_ssl_active] not found in config > [2017-12-13 10:45:53,625] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,622] {default_celery.py:41} WARNING - Celery Executor will run without SSL > [2017-12-13 10:45:53,626] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,624] {__init__.py:45} INFO - Using executor CeleryExecutor > [2017-12-13 10:45:53,755] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,754] {models.py:189} INFO - Filling up the DagBag from /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,859] {cli.py:374} INFO - Running on host ac5d0787084d > [2017-12-13 10:45:53,901] {logging_mixin.py:84} INFO - Hi from /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,902] {logging_mixin.py:84} INFO - Hi 2 from /usr/local/airflow/dags/example/csv_to_parquet.py > [2017-12-13 10:45:53,903] {csv_to_parquet.py:27} ERROR - Hello > [2017-12-13 10:45:53,905] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:53,904] {python_operator.py:90} INFO - Done. Returned value was: None > [2017-12-13 10:45:54,923] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:54,922] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): sts.amazonaws.com > [2017-12-13 10:45:55,498] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:55,498] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): s3.eu-west-1.amazonaws.com > [2017-12-13 10:45:55,784] {base_task_runner.py:98} INFO - Subtask: [2017-12-13 10:45:55,784] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): s3.eu-west-1.amazonaws.com > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)