airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Klosowski <dav...@thinknear.com>
Subject Re: Using s3 logging in Airflow 1.9.x
Date Mon, 09 Oct 2017 17:28:40 GMT
Out of curiosity, has anyone had the S3 logging feature work that has
tested it out in the latest 1.9 branch of Airflow?

I'm imagining that anyone with a distributed container environment with
Airflow will want to be able to view the task logs in the UI, and w/o this
feature it won't work as you can't share the task logs at the host level
when the containers are distributed.

Thanks.

Regards,
David

On Sat, Oct 7, 2017 at 10:06 AM, David Klosowski <davidk@thinknear.com>
wrote:

> Hi Ash,
>
> Thanks for the response .
>
> I neglected to post that I do in fact have that config specified in the
> airflow.cfg
>
> My file logger shows the custom formatting I have set and I see a log line
> in the console from the scheduler:
>
> {logging_config.py:42} INFO - Successfully imported user-defined logging
> config from
>
> Cheers,
> David
>
>
> On Sat, Oct 7, 2017 at 7:42 AM, Ash Berlin-Taylor <
> ash_airflowlist@firemirror.com> wrote:
>
>> It could be that you have created a custom logging file, but you haven't
>> specified it in your airflow.cfg:
>>
>> ```
>> logging_config_class=mymodule.LOGGING_CONFIG
>> ```
>>
>>
>>
>>
>> > On 7 Oct 2017, at 00:11, David Klosowski <davidk@thinknear.com> wrote:
>> >
>> > Hey Airflow Devs:
>> >
>> > How is s3 logging supposed to work in Airflow 1.9.0?
>> >
>> > I've followed the *UPDATING.md* guide for the new setup of logging and
>> > while I can use my custom logging configuration module to format the
>> files
>> > written to the host, the s3 logging doesn't appear to work as I don't
>> see
>> > anything in s3.
>> >
>> > *> airflow.cfg*
>> >
>> > task_log_reader = s3.task
>> >
>> >
>> > *> custom logging module added to PYTHONPATH with __init__.py in
>> directory*
>> > ----
>> > import os
>> >
>> > from airflow import configuration as conf
>> >
>> > LOG_LEVEL = conf.get('core', 'LOGGING_LEVEL').upper()
>> > LOG_FORMAT = conf.get('core', 'log_format')
>> >
>> > BASE_LOG_FOLDER = conf.get('core', 'BASE_LOG_FOLDER')
>> >
>> > FILENAME_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{
>> > try_number }}.log'
>> >
>> > STAGE = os.getenv('STAGE')
>> >
>> > S3_LOG_FOLDER = 's3://tn-testing-bucket/dk
>> >
>> > LOGGING_CONFIG = {
>> >    'version': 1,
>> >    'disable_existing_loggers': False,
>> >    'formatters': {
>> >        'airflow.task': {
>> >            'format': LOG_FORMAT,
>> >        },
>> >    },
>> >    'handlers': {
>> >        'console': {
>> >            'class': 'logging.StreamHandler',
>> >            'formatter': 'airflow.task',
>> >            'stream': 'ext://sys.stdout'
>> >        },
>> >        'file.task': {
>> >            'class': 'airflow.utils.log.file_task_h
>> andler.FileTaskHandler',
>> >            'formatter': 'airflow.task',
>> >            'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
>> >            'filename_template': FILENAME_TEMPLATE,
>> >        },
>> >        # When using s3 or gcs, provide a customized LOGGING_CONFIG
>> >        # in airflow_local_settings within your PYTHONPATH, see
>> UPDATING.md
>> >        # for details
>> >        's3.task': {
>> >            'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
>> >            'formatter': 'airflow.task',
>> >            'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
>> >            's3_log_folder': S3_LOG_FOLDER,
>> >            'filename_template': FILENAME_TEMPLATE,
>> >        }
>> >    },
>> >    'loggers': {
>> >        'airflow.task': {
>> >            'handlers': ['s3.task'],
>> >            'level': LOG_LEVEL,
>> >            'propagate': False,
>> >        },
>> >        'airflow.task_runner': {
>> >            'handlers': ['s3.task'],
>> >            'level': LOG_LEVEL,
>> >            'propagate': True,
>> >        },
>> >        'airflow': {
>> >            'handlers': ['console'],
>> >            'level': LOG_LEVEL,
>> >            'propagate': False,
>> >        },
>> >    }
>> > }
>> > -----
>> >
>> > I never see any task logs in S3, even after completion of all tasks.
>> >
>> > While running this in docker since I my executors/workers are on
>> different
>> > hosts, when I try to pull up the task logs in the UI I receive the
>> > following error b/c they there in s3:
>> >
>> > Failed to fetch log file from worker.
>> > HTTPConnectionPool(host='f0cf9e596af6', port=8793): Max retries
>> > exceeded with url:
>> >
>> >
>> >
>> > Any additional hints you can provide on what else needs to be done?
>> >
>> > Thanks.
>> >
>> > Regards,
>> > David
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message