airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From criccom...@apache.org
Subject incubator-airflow git commit: [AIRFLOW-1691] Add better Google cloud logging documentation
Date Mon, 09 Oct 2017 17:37:46 GMT
Repository: incubator-airflow
Updated Branches:
  refs/heads/v1-9-test 5fb5cd10d -> ace2b1d24


[AIRFLOW-1691] Add better Google cloud logging documentation

Closes #2671 from criccomini/fix-log-docs


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/ace2b1d2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/ace2b1d2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/ace2b1d2

Branch: refs/heads/v1-9-test
Commit: ace2b1d2498e8e3464e5597cbb86e69d90fdb897
Parents: 5fb5cd1
Author: Chris Riccomini <criccomini@apache.org>
Authored: Mon Oct 9 10:32:34 2017 -0700
Committer: Chris Riccomini <criccomini@apache.org>
Committed: Mon Oct 9 10:37:33 2017 -0700

----------------------------------------------------------------------
 UPDATING.md          |  6 ++--
 docs/integration.rst | 71 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ace2b1d2/UPDATING.md
----------------------------------------------------------------------
diff --git a/UPDATING.md b/UPDATING.md
index 329f416..6a0b8bc 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -129,13 +129,13 @@ The `file_task_handler` logger is more flexible. You can change the
default form
 
 #### I'm using S3Log or GCSLogs, what do I do!?
 
-IF you are logging to either S3Log or GCSLogs, you will need a custom logging config. The
`REMOTE_BASE_LOG_FOLDER` configuration key in your airflow config has been removed, therefore
you will need to take the following steps:
+If you are logging to Google cloud storage, please see the [Google cloud platform documentation](https://airflow.incubator.apache.org/integration.html#gcp-google-cloud-platform)
for logging instructions.
+
+If you are using S3, the instructions should be largely the same as the Google cloud platform
instructions above. You will need a custom logging config. The `REMOTE_BASE_LOG_FOLDER` configuration
key in your airflow config has been removed, therefore you will need to take the following
steps:
  - Copy the logging configuration from [`airflow/config_templates/airflow_logging_settings.py`](https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/airflow_local_settings.py)
and copy it. 
  - Place it in a directory inside the Python import path `PYTHONPATH`. If you are using Python
2.7, ensuring that any `__init__.py` files exist so that it is importable.
  - Update the config by setting the path of `REMOTE_BASE_LOG_FOLDER` explicitly in the config.
The `REMOTE_BASE_LOG_FOLDER` key is not used anymore. 
  - Set the `logging_config_class` to the filename and dict. For example, if you place `custom_logging_config.py`
on the base of your pythonpath, you will need to set `logging_config_class = custom_logging_config.LOGGING_CONFIG`
in your config as Airflow 1.8.
- 
-ELSE you don't need to change anything. If there is no custom config, the airflow config
loader will still default to the same config. 
 
 ### New Features
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ace2b1d2/docs/integration.rst
----------------------------------------------------------------------
diff --git a/docs/integration.rst b/docs/integration.rst
index 3b50586..cd6cc68 100644
--- a/docs/integration.rst
+++ b/docs/integration.rst
@@ -184,6 +184,77 @@ Airflow has extensive support for the Google Cloud Platform. But note
that most
 Operators are in the contrib section. Meaning that they have a *beta* status, meaning that
 they can have breaking changes between minor releases.
 
+Logging
+''''''''
+
+Airflow can be configured to read and write task logs in Google cloud storage.
+Follow the steps below to enable Google cloud storage logging.
+
+#. Airlfow's logging system requires a custom .py file to be located in the ``PYTHONPATH``,
so that it's importable from Airflow. Start by creating a directory to store the config file.
``$AIRFLOW_HOME/config`` is recommended.
+#. Set ``PYTHONPATH=$PYTHONPATH:<AIRFLOW_HOME>/config`` in the Airflow environment.
If using Supervisor, you can set this in the ``supervisord.conf`` environment parameter. If
not, you can export ``PYTHONPATH`` using your preferred method.
+#. Create empty files called ``$AIRFLOW_HOME/config/log_config.py`` and ``$AIRFLOW_HOME/config/__init__.py``.
+#. Copy the contents of ``airflow/config_templates/airflow_local_settings.py`` into the ``log_config.py``
file that was just created in the step above.
+#. Customize the following portions of the template:
+
+    .. code-block:: bash
+
+        # Add this variable to the top of the file. Note the trailing slash.
+        GCS_LOG_FOLDER = 'gs://<bucket where logs should be persisted>/'
+
+        # Rename DEFAULT_LOGGING_CONFIG to LOGGING CONFIG
+        LOGGING_CONFIG = ...
+
+        # Add a GCSTaskHandler to the 'handlers' block of the LOGGING_CONFIG variable
+        'gcs.task': {
+            'class': 'airflow.utils.log.gcs_task_handler.GCSTaskHandler',
+            'formatter': 'airflow.task',
+            'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
+            'gcs_log_folder': GCS_LOG_FOLDER,
+            'filename_template': FILENAME_TEMPLATE,
+        },
+
+        # Update the airflow.task and airflow.tas_runner blocks to be 'gcs.task' instead
of 'file.task'.
+        'loggers': {
+            'airflow.task': {
+                'handlers': ['gcs.task'],
+                ...
+            },
+            'airflow.task_runner': {
+                'handlers': ['gcs.task'],
+                ...
+            },
+            'airflow': {
+                'handlers': ['console'],
+                ...
+            },
+        }
+
+#. Make sure a Google cloud platform connection hook has been defined in Airflow. The hook
should have read and write access to the Google cloud storage bucket defined above in ``GCS_LOG_FOLDER``.
+
+#. Update ``$AIRFLOW_HOME/airflow.cfg`` to contain:
+
+    .. code-block:: bash
+
+        task_log_reader = gcs.task
+        logging_config_class = log_config.LOGGING_CONFIG
+        remote_log_conn_id = <name of the Google cloud platform hook>
+
+#. Restart the Airflow webserver and scheduler, and trigger (or wait for) a new task execution.
+#. Verify that logs are showing up for newly executed tasks in the bucket you've defined.
+#. Verify that the Google cloud storage viewer is working in the UI. Pull up a newly executed
task, and verify that you see something like:
+
+    .. code-block:: bash
+
+        *** Reading remote log from gs://<bucket where logs should be persisted>/example_bash_operator/run_this_last/2017-10-03T00:00:00/16.log.
+        [2017-10-03 21:57:50,056] {cli.py:377} INFO - Running on host chrisr-00532
+        [2017-10-03 21:57:50,093] {base_task_runner.py:115} INFO - Running: ['bash', '-c',
u'airflow run example_bash_operator run_this_last 2017-10-03T00:00:00 --job_id 47 --raw -sd
DAGS_FOLDER/example_dags/example_bash_operator.py']
+        [2017-10-03 21:57:51,264] {base_task_runner.py:98} INFO - Subtask: [2017-10-03 21:57:51,263]
{__init__.py:45} INFO - Using executor SequentialExecutor
+        [2017-10-03 21:57:51,306] {base_task_runner.py:98} INFO - Subtask: [2017-10-03 21:57:51,306]
{models.py:186} INFO - Filling up the DagBag from /airflow/dags/example_dags/example_bash_operator.py
+
+Note the top line that says it's reading from the remote log file.
+
+Please be aware that if you were persisting logs to Google cloud storage using the old-style
airflow.cfg configuration method, the old logs will no longer be visible in the Airflow UI,
though they'll still exist in Google cloud storage. This is a backwards incompatbile change.
If you are unhappy with it, you can change the ``FILENAME_TEMPLATE`` to reflect the old-style
log filename format.
+
 BigQuery
 ''''''''
 


Mime
View raw message