Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A2ADD20049D for ; Wed, 9 Aug 2017 23:50:26 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A123616A2A3; Wed, 9 Aug 2017 21:50:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8FB6216A2A2 for ; Wed, 9 Aug 2017 23:50:25 +0200 (CEST) Received: (qmail 32155 invoked by uid 500); 9 Aug 2017 21:50:24 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 32145 invoked by uid 99); 9 Aug 2017 21:50:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Aug 2017 21:50:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 53494C14D7 for ; Wed, 9 Aug 2017 21:50:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.222 X-Spam-Level: X-Spam-Status: No, score=-4.222 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id RSPW1QEuyptM for ; Wed, 9 Aug 2017 21:50:22 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id 392065F56A for ; Wed, 9 Aug 2017 21:50:22 +0000 (UTC) Received: (qmail 32114 invoked by uid 99); 9 Aug 2017 21:50:21 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Aug 2017 21:50:21 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id A7F1CE96E8; Wed, 9 Aug 2017 21:50:21 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: davydov@apache.org To: commits@airflow.incubator.apache.org Message-Id: <85a4b75a309c4a83a342f83708f6f2dd@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: incubator-airflow git commit: [AIRFLOW-1443] Update Airflow configuration documentation Date: Wed, 9 Aug 2017 21:50:21 +0000 (UTC) archived-at: Wed, 09 Aug 2017 21:50:26 -0000 Repository: incubator-airflow Updated Branches: refs/heads/master d9109d645 -> 6825d97b8 [AIRFLOW-1443] Update Airflow configuration documentation This PR updates Airflow configuration documentations to include a recent change to split task logs by try number #2383. Closes #2467 from AllisonWang/allison--update-doc Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/6825d97b Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/6825d97b Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/6825d97b Branch: refs/heads/master Commit: 6825d97b82a3b235685ea8265380a20eea90c990 Parents: d9109d6 Author: AllisonWang Authored: Wed Aug 9 14:49:54 2017 -0700 Committer: Dan Davydov Committed: Wed Aug 9 14:49:56 2017 -0700 ---------------------------------------------------------------------- UPDATING.md | 29 ++++++++++++++++------------- docs/configuration.rst | 15 ++++++++------- 2 files changed, 24 insertions(+), 20 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/6825d97b/UPDATING.md ---------------------------------------------------------------------- diff --git a/UPDATING.md b/UPDATING.md index a02ff04..3a880ab 100644 --- a/UPDATING.md +++ b/UPDATING.md @@ -9,8 +9,11 @@ assists people when migrating to a new version. SSH Hook now uses Paramiko library to create ssh client connection, instead of sub-process based ssh command execution previously (<1.9.0), so this is backward incompatible. - update SSHHook constructor - use SSHOperator class in place of SSHExecuteOperator which is removed now. Refer test_ssh_operator.py for usage info. - - SFTPOperator is added to perform secure file transfer from serverA to serverB. Refer test_sftp_operator.py.py for usage info. - - No updates are required if you are using ftpHook, it will continue work as is. + - SFTPOperator is added to perform secure file transfer from serverA to serverB. Refer test_sftp_operator.py.py for usage info. + - No updates are required if you are using ftpHook, it will continue work as is. + +### Logging update + Logs now are stored in the log folder as ``{dag_id}/{task_id}/{execution_date}/{try_number}.log``. ### New Features @@ -61,8 +64,8 @@ interfere. Please read through these options, defaults have changed since 1.7.1. #### child_process_log_directory -In order the increase the robustness of the scheduler, DAGS our now processed in their own process. Therefore each -DAG has its own log file for the scheduler. These are placed in `child_process_log_directory` which defaults to +In order the increase the robustness of the scheduler, DAGS our now processed in their own process. Therefore each +DAG has its own log file for the scheduler. These are placed in `child_process_log_directory` which defaults to `/scheduler/latest`. You will need to make sure these log files are removed. > DAG logs or processor logs ignore and command line settings for log file locations. @@ -72,7 +75,7 @@ Previously the command line option `num_runs` was used to let the scheduler term loops. This is now time bound and defaults to `-1`, which means run continuously. See also num_runs. #### num_runs -Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops. Now num_runs specifies +Previously `num_runs` was used to let the scheduler terminate after a certain amount of loops. Now num_runs specifies the number of times to try to schedule each DAG file within `run_duration` time. Defaults to `-1`, which means try indefinitely. This is only available on the command line. @@ -85,7 +88,7 @@ dags are not being picked up, have a look at this number and decrease it when ne #### catchup_by_default By default the scheduler will fill any missing interval DAG Runs between the last execution date and the current date. -This setting changes that behavior to only execute the latest interval. This can also be specified per DAG as +This setting changes that behavior to only execute the latest interval. This can also be specified per DAG as `catchup = False / True`. Command line backfills will still work. ### Faulty Dags do not show an error in the Web UI @@ -109,33 +112,33 @@ convenience variables to the config. In case your run a sceure Hadoop setup it m required to whitelist these variables by adding the following to your configuration: ``` - + hive.security.authorization.sqlstd.confwhitelist.append airflow\.ctx\..* ``` ### Google Cloud Operator and Hook alignment -All Google Cloud Operators and Hooks are aligned and use the same client library. Now you have a single connection +All Google Cloud Operators and Hooks are aligned and use the same client library. Now you have a single connection type for all kinds of Google Cloud Operators. If you experience problems connecting with your operator make sure you set the connection type "Google Cloud Platform". -Also the old P12 key file type is not supported anymore and only the new JSON key files are supported as a service +Also the old P12 key file type is not supported anymore and only the new JSON key files are supported as a service account. - + ### Deprecated Features -These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer +These features are marked for deprecation. They may still work (and raise a `DeprecationWarning`), but are no longer supported and will be removed entirely in Airflow 2.0 - Hooks and operators must be imported from their respective submodules - `airflow.operators.PigOperator` is no longer supported; `from airflow.operators.pig_operator import PigOperator` is. + `airflow.operators.PigOperator` is no longer supported; `from airflow.operators.pig_operator import PigOperator` is. (AIRFLOW-31, AIRFLOW-200) - Operators no longer accept arbitrary arguments - Previously, `Operator.__init__()` accepted any arguments (either positional `*args` or keyword `**kwargs`) without + Previously, `Operator.__init__()` accepted any arguments (either positional `*args` or keyword `**kwargs`) without complaint. Now, invalid arguments will be rejected. (https://github.com/apache/incubator-airflow/pull/1285) ### Known Issues http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/6825d97b/docs/configuration.rst ---------------------------------------------------------------------- diff --git a/docs/configuration.rst b/docs/configuration.rst index 838bc09..e68a341 100644 --- a/docs/configuration.rst +++ b/docs/configuration.rst @@ -83,7 +83,7 @@ within the metadata database. The ``crypto`` package is highly recommended during installation. The ``crypto`` package does require that your operating system have libffi-dev installed. -If ``crypto`` package was not installed initially, you can still enable encryption for +If ``crypto`` package was not installed initially, you can still enable encryption for connections by following steps below: 1. Install crypto package ``pip install apache-airflow[crypto]`` @@ -94,17 +94,17 @@ connections by following steps below: from cryptography.fernet import Fernet fernet_key= Fernet.generate_key() print(fernet_key) # your fernet_key, keep it in secured place! - -3. Replace ``airflow.cfg`` fernet_key value with the one from step 2. + +3. Replace ``airflow.cfg`` fernet_key value with the one from step 2. Alternatively, you can store your fernet_key in OS environment variable. You -do not need to change ``airflow.cfg`` in this case as AirFlow will use environment +do not need to change ``airflow.cfg`` in this case as AirFlow will use environment variable over the value in ``airflow.cfg``: .. code-block:: bash - + # Note the double underscores EXPORT AIRFLOW__CORE__FERNET_KEY = your_fernet_key - + 4. Restart AirFlow webserver. 5. For existing connections (the ones that you had defined before installing ``airflow[crypto]`` and creating a Fernet key), you need to open each connection in the connection admin UI, re-type the password, and save it. @@ -219,7 +219,8 @@ try to use ``S3Hook('MyS3Conn')``. In the Airflow Web UI, local logs take precedance over remote logs. If local logs can not be found or accessed, the remote logs will be displayed. Note that logs are only sent to remote storage once a task completes (including failure). In other -words, remote logs for running tasks are unavailable. +words, remote logs for running tasks are unavailable. Logs are stored in the log +folder as ``{dag_id}/{task_id}/{execution_date}/{try_number}.log``. Scaling Out on Mesos (community contributed) ''''''''''''''''''''''''''''''''''''''''''''