airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Lam <ke...@fathomhealth.co>
Subject Re: How to bes use Google Cloud Storage for logging?
Date Wed, 20 Dec 2017 16:44:24 GMT
I got it to work, it seems i had mismatched some code (
airflow/config_templates/airflow_local_settings.py) from the master branch
in the v1-9-stable branch. Thanks for your help everyone!

On Wed, Dec 20, 2017 at 11:01 AM, Kevin Lam <kevin@fathomhealth.co> wrote:

> Hi Ash,
>
> That run was at the head of master branch in github:
>
> https://github.com/apache/incubator-airflow/blob/master/
> airflow/utils/log/gcs_task_handler.py#L144
>
>
> On Wed, Dec 20, 2017 at 10:54 AM, Ash Berlin-Taylor <
> ash_airflowlist@firemirror.com> wrote:
>
>> What version are you on? I can't match up the line numbers in this stack
>> trace to either 1.9.0rc8 or 1.9.0rc2 -- both of which show the 'if old_log
>> else log' on line 157
>>
>> -ash
>>
>>
>> > On 20 Dec 2017, at 15:25, Kevin Lam <kevin@fathomhealth.co> wrote:
>> >
>> > Thanks Bolke and Feng!
>> >
>> > I seem to have a working connection with GCS but it seems there some
>> error
>> > occuring in the gcs_task_handler in airflow:
>> >
>> > Traceback (most recent call last):
>> >  File "/usr/local/bin/airflow", line 27, in <module>
>> >    args.func(args)
>> >  File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line
>> > 423, in run
>> >    logging.shutdown()
>> >  File "/usr/lib/python3.5/logging/__init__.py", line 1882, in shutdown
>> >    h.close()
>> >  File
>> > "/usr/local/lib/python3.5/dist-packages/airflow/utils/log/
>> gcs_task_handler.py",
>> > line 87, in close
>> >    self.gcs_write(log, remote_loc)
>> >  File
>> > "/usr/local/lib/python3.5/dist-packages/airflow/utils/log/
>> gcs_task_handler.py",
>> > line 144, in gcs_write
>> >    log = '\n'.join([old_log, log]) if old_log else log
>> > UnboundLocalError: local variable 'old_log' referenced before assignment
>> >
>> > I believe the connection is working because the tasks are getting a 404
>> > instead of 403 when trying to read from remote logs, but they aren't
>> being
>> > written because of the above error.
>> >
>> > Eg.
>> >
>> > *** Unable to read remote log from
>> > gs://<mybucket>/<...>/2017-12-20T15:21:23.704614+00:00/1.log
>> > *** <HttpError 404 when requesting
>> > https://www.googleapis.com/storage/v1/b/<mybucket>/o/<...>
>> F2017-12-20T15%3A21%3A23.704614%2B00%3A00%2F1.log?alt=media
>> > returned "Not Found">
>> >
>> >
>> > On Wed, Dec 20, 2017 at 1:48 AM, Bolke de Bruin <bdbruin@gmail.com>
>> wrote:
>> >
>> >> Both will/should work, master is just cleaner and more manageable.
>> >>
>> >> B.
>> >>
>> >> Verstuurd vanaf mijn iPad
>> >>
>> >>> Op 19 dec. 2017 om 23:44 heeft Kevin Lam <kevin@fathomhealth.co>
het
>> >> volgende geschreven:
>> >>>
>> >>> Looks like it might be related to
>> >>> https://github.com/apache/incubator-airflow/commit/
>> >> 02ff8ae35dd16e6f23d29d7b24a5fb9c09d0b7a4?
>> >>> Why isn't this fix on the v1-9 branches? Should I be using master
>> >> instead?
>> >>>
>> >>>> On Tue, Dec 19, 2017 at 5:37 PM, Kevin Lam <kevin@fathomhealth.co>
>> >> wrote:
>> >>>>
>> >>>> Hi Feng,
>> >>>>
>> >>>> Thanks for your help! Got it, will try to push on the python based
>> >> logging
>> >>>> config.
>> >>>>
>> >>>> I'm trying to set-up the GCS logging on airflow v1-9-stable and
my
>> >>>> logging_config.py seems to be causing a python import error, caused
>> by
>> >>>> 'from airflow import configuration'
>> >>>>
>> >>>> "Initialize database...
>> >>>> Unable to load the config, contains a configuration error.
>> >>>> Traceback (most recent call last):
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve
>> >>>>   self.importer(used)
>> >>>> ImportError: No module named 'airflow.utils.log.logging_
>> >> mixin.RedirectStdHandler';
>> >>>> 'airflow.utils.log.logging_mixin' is not a package
>> >>>>
>> >>>> The above exception was the direct cause of the following exception:
>> >>>>
>> >>>> Traceback (most recent call last):
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 558, in configure
>> >>>>   handler = self.configure_handler(handlers[name])
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 708, in
>> >>>> configure_handler
>> >>>>   klass = self.resolve(cname)
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 391, in resolve
>> >>>>   raise v
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve
>> >>>>   self.importer(used)
>> >>>> ValueError: Cannot resolve 'airflow.utils.log.logging_
>> >> mixin.RedirectStdHandler':
>> >>>> No module named 'airflow.utils.log.logging_mix
>> in.RedirectStdHandler';
>> >>>> 'airflow.utils.log.logging_mixin' is not a package
>> >>>>
>> >>>> During handling of the above exception, another exception occurred:
>> >>>>
>> >>>> Traceback (most recent call last):
>> >>>> File "/usr/local/bin/airflow", line 16, in <module>
>> >>>>   from airflow import configuration
>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/__init__.py",
>> >> line
>> >>>> 31, in <module>
>> >>>>   from airflow import settings
>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/settings.py",
>> >> line
>> >>>> 148, in <module>
>> >>>>   configure_logging()
>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_
>> >> config.py",
>> >>>> line 75, in configure_logging
>> >>>>   raise e
>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_
>> >> config.py",
>> >>>> line 70, in configure_logging
>> >>>>   dictConfig(logging_config)
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 795, in dictConfig
>> >>>>   dictConfigClass(config).configure()
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 566, in configure
>> >>>>   '%r: %s' % (name, e))
>> >>>> ValueError: Unable to configure handler 'console': Cannot resolve
>> >>>> 'airflow.utils.log.logging_mixin.RedirectStdHandler': No module
>> named
>> >>>> 'airflow.utils.log.logging_mixin.RedirectStdHandler';
>> >>>> 'airflow.utils.log.logging_mixin' is not a package
>> >>>> HTTP/1.1 200 OK
>> >>>> Unable to load the config, contains a configuration error.
>> >>>> Traceback (most recent call last):
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve
>> >>>>   self.importer(used)
>> >>>> ImportError: No module named 'airflow.utils.log.logging_
>> >> mixin.RedirectStdHandler';
>> >>>> 'airflow.utils.log.logging_mixin' is not a package
>> >>>>
>> >>>> The above exception was the direct cause of the following exception:
>> >>>>
>> >>>> Traceback (most recent call last):
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 558, in configure
>> >>>>   handler = self.configure_handler(handlers[name])
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 708, in
>> >>>> configure_handler
>> >>>>   klass = self.resolve(cname)
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 391, in resolve
>> >>>>   raise v
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve
>> >>>>   self.importer(used)
>> >>>> ValueError: Cannot resolve 'airflow.utils.log.logging_
>> >> mixin.RedirectStdHandler':
>> >>>> No module named 'airflow.utils.log.logging_mix
>> in.RedirectStdHandler';
>> >>>> 'airflow.utils.log.logging_mixin' is not a package
>> >>>>
>> >>>> During handling of the above exception, another exception occurred:
>> >>>>
>> >>>> Traceback (most recent call last):
>> >>>> File "/usr/local/bin/airflow", line 16, in <module>
>> >>>>   from airflow import configuration
>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/__init__.py",
>> >> line
>> >>>> 31, in <module>
>> >>>>   from airflow import settings
>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/settings.py",
>> >> line
>> >>>> 148, in <module>
>> >>>>   configure_logging()
>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_
>> >> config.py",
>> >>>> line 75, in configure_logging
>> >>>>   raise e
>> >>>> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_
>> >> config.py",
>> >>>> line 70, in configure_logging
>> >>>>   dictConfig(logging_config)
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 795, in dictConfig
>> >>>>   dictConfigClass(config).configure()
>> >>>> File "/usr/lib/python3.5/logging/config.py", line 566, in configure
>> >>>>   '%r: %s' % (name, e))
>> >>>> ValueError: Unable to configure handler 'console': Cannot resolve
>> >>>> 'airflow.utils.log.logging_mixin.RedirectStdHandler': No module
>> named
>> >>>> 'airflow.utils.log.logging_mixin.RedirectStdHandler';
>> >>>> 'airflow.utils.log.logging_mixin' is not a package"
>> >>>>
>> >>>> Have you encountered this before?
>> >>>>
>> >>>> On Mon, Dec 18, 2017 at 8:53 PM, Feng Lu <fenglu@google.com.invalid>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi Kevin,
>> >>>>>
>> >>>>> Kindly see my reply inline:
>> >>>>>
>> >>>>>> On Mon, Dec 18, 2017 at 3:28 PM, Kevin Lam <kevin@fathomhealth.co>
>> >> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> I'm trying to get airflow to use GCS for logging purposes
and had a
>> >> few
>> >>>>>> questions.
>> >>>>>>
>> >>>>>> We're currently using Airflow 1.9rc2, running in a Kubernetes
>> Airflow
>> >>>>>> deployment (similar to https://github.com/mumoshu/kube-airflow)
>> >>>>>>
>> >>>>>> 1/ Seems like the logging code has been going through some
changes
>> in
>> >>>>> the
>> >>>>>> recent versions. What's the correct way to set up GCS for
logging?
>> Is
>> >>>>> it by
>> >>>>>> just specifying remote_base_log_folder and remote_log_conn_id
in
>> >>>>>> airflow.cfg? Or by following this guide:
>> >>>>>> http://airflow.readthedocs.io/en/latest/integration.html#gcp,
>> using
>> >> the
>> >>>>>> python based logging config? Is there an Airflow version
that we
>> >> should
>> >>>>> use
>> >>>>>> to be most stable?
>> >>>>>>
>> >>>>> The python based logging config is the right place to make changes,
>> in
>> >> our
>> >>>>> test setup, we override the airflow_local_settings.py similarly
to
>> the
>> >>>>> link
>> >>>>> you pasted.
>> >>>>> You may also want to config: [core]task_log_reader = gcs.task
>> >>>>>
>> >>>>>
>> >>>>>>
>> >>>>>> 2/ Is there a way to encode the connection for GCS in a
file so
>> that
>> >> one
>> >>>>>> doesn't have to open the webserver and create it from the
admin
>> panel?
>> >>>>> It'd
>> >>>>>> be nice if the GCS connection would be automatically created.
>> >>>>>>
>> >>>>> Unfortunately GCS connection ties to some GCP project and is
>> >> impossible to
>> >>>>> pre-populate.
>> >>>>> Airflow1.9 should fix the gcp connection type issue  (
>> >>>>> https://github.com/apache/incubator-airflow/commit/2f107d8a3
>> >>>>> 0910fd025774004d5c4c95407ed55c5),
>> >>>>> so you can use airflow connections CLI directly.
>> >>>>>
>> >>>>>
>> >>>>>>
>> >>>>>> Thanks in advance for your help!
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message