airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject incubator-airflow git commit: [AIRFLOW-1691] Add better Google cloud logging documentation
Date Mon, 09 Oct 2017 17:37:46 GMT
Repository: incubator-airflow
Updated Branches:
  refs/heads/v1-9-test 5fb5cd10d -> ace2b1d24

[AIRFLOW-1691] Add better Google cloud logging documentation

Closes #2671 from criccomini/fix-log-docs


Branch: refs/heads/v1-9-test
Commit: ace2b1d2498e8e3464e5597cbb86e69d90fdb897
Parents: 5fb5cd1
Author: Chris Riccomini <>
Authored: Mon Oct 9 10:32:34 2017 -0700
Committer: Chris Riccomini <>
Committed: Mon Oct 9 10:37:33 2017 -0700

----------------------------------------------------------------------          |  6 ++--
 docs/integration.rst | 71 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 3 deletions(-)
diff --git a/ b/
index 329f416..6a0b8bc 100644
--- a/
+++ b/
@@ -129,13 +129,13 @@ The `file_task_handler` logger is more flexible. You can change the
default form
 #### I'm using S3Log or GCSLogs, what do I do!?
-IF you are logging to either S3Log or GCSLogs, you will need a custom logging config. The
`REMOTE_BASE_LOG_FOLDER` configuration key in your airflow config has been removed, therefore
you will need to take the following steps:
+If you are logging to Google cloud storage, please see the [Google cloud platform documentation](
for logging instructions.
+If you are using S3, the instructions should be largely the same as the Google cloud platform
instructions above. You will need a custom logging config. The `REMOTE_BASE_LOG_FOLDER` configuration
key in your airflow config has been removed, therefore you will need to take the following
  - Copy the logging configuration from [`airflow/config_templates/`](
and copy it. 
  - Place it in a directory inside the Python import path `PYTHONPATH`. If you are using Python
2.7, ensuring that any `` files exist so that it is importable.
  - Update the config by setting the path of `REMOTE_BASE_LOG_FOLDER` explicitly in the config.
The `REMOTE_BASE_LOG_FOLDER` key is not used anymore. 
  - Set the `logging_config_class` to the filename and dict. For example, if you place ``
on the base of your pythonpath, you will need to set `logging_config_class = custom_logging_config.LOGGING_CONFIG`
in your config as Airflow 1.8.
-ELSE you don't need to change anything. If there is no custom config, the airflow config
loader will still default to the same config. 
 ### New Features
diff --git a/docs/integration.rst b/docs/integration.rst
index 3b50586..cd6cc68 100644
--- a/docs/integration.rst
+++ b/docs/integration.rst
@@ -184,6 +184,77 @@ Airflow has extensive support for the Google Cloud Platform. But note
that most
 Operators are in the contrib section. Meaning that they have a *beta* status, meaning that
 they can have breaking changes between minor releases.
+Airflow can be configured to read and write task logs in Google cloud storage.
+Follow the steps below to enable Google cloud storage logging.
+#. Airlfow's logging system requires a custom .py file to be located in the ``PYTHONPATH``,
so that it's importable from Airflow. Start by creating a directory to store the config file.
``$AIRFLOW_HOME/config`` is recommended.
+#. Set ``PYTHONPATH=$PYTHONPATH:<AIRFLOW_HOME>/config`` in the Airflow environment.
If using Supervisor, you can set this in the ``supervisord.conf`` environment parameter. If
not, you can export ``PYTHONPATH`` using your preferred method.
+#. Create empty files called ``$AIRFLOW_HOME/config/`` and ``$AIRFLOW_HOME/config/``.
+#. Copy the contents of ``airflow/config_templates/`` into the ````
file that was just created in the step above.
+#. Customize the following portions of the template:
+    .. code-block:: bash
+        # Add this variable to the top of the file. Note the trailing slash.
+        GCS_LOG_FOLDER = 'gs://<bucket where logs should be persisted>/'
+        LOGGING_CONFIG = ...
+        # Add a GCSTaskHandler to the 'handlers' block of the LOGGING_CONFIG variable
+        'gcs.task': {
+            'class': 'airflow.utils.log.gcs_task_handler.GCSTaskHandler',
+            'formatter': 'airflow.task',
+            'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
+            'gcs_log_folder': GCS_LOG_FOLDER,
+            'filename_template': FILENAME_TEMPLATE,
+        },
+        # Update the airflow.task and airflow.tas_runner blocks to be 'gcs.task' instead
of 'file.task'.
+        'loggers': {
+            'airflow.task': {
+                'handlers': ['gcs.task'],
+                ...
+            },
+            'airflow.task_runner': {
+                'handlers': ['gcs.task'],
+                ...
+            },
+            'airflow': {
+                'handlers': ['console'],
+                ...
+            },
+        }
+#. Make sure a Google cloud platform connection hook has been defined in Airflow. The hook
should have read and write access to the Google cloud storage bucket defined above in ``GCS_LOG_FOLDER``.
+#. Update ``$AIRFLOW_HOME/airflow.cfg`` to contain:
+    .. code-block:: bash
+        task_log_reader = gcs.task
+        logging_config_class = log_config.LOGGING_CONFIG
+        remote_log_conn_id = <name of the Google cloud platform hook>
+#. Restart the Airflow webserver and scheduler, and trigger (or wait for) a new task execution.
+#. Verify that logs are showing up for newly executed tasks in the bucket you've defined.
+#. Verify that the Google cloud storage viewer is working in the UI. Pull up a newly executed
task, and verify that you see something like:
+    .. code-block:: bash
+        *** Reading remote log from gs://<bucket where logs should be persisted>/example_bash_operator/run_this_last/2017-10-03T00:00:00/16.log.
+        [2017-10-03 21:57:50,056] {} INFO - Running on host chrisr-00532
+        [2017-10-03 21:57:50,093] {} INFO - Running: ['bash', '-c',
u'airflow run example_bash_operator run_this_last 2017-10-03T00:00:00 --job_id 47 --raw -sd
+        [2017-10-03 21:57:51,264] {} INFO - Subtask: [2017-10-03 21:57:51,263]
{} INFO - Using executor SequentialExecutor
+        [2017-10-03 21:57:51,306] {} INFO - Subtask: [2017-10-03 21:57:51,306]
{} INFO - Filling up the DagBag from /airflow/dags/example_dags/
+Note the top line that says it's reading from the remote log file.
+Please be aware that if you were persisting logs to Google cloud storage using the old-style
airflow.cfg configuration method, the old logs will no longer be visible in the Airflow UI,
though they'll still exist in Google cloud storage. This is a backwards incompatbile change.
If you are unhappy with it, you can change the ``FILENAME_TEMPLATE`` to reflect the old-style
log filename format.

View raw message