airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <ash_airflowl...@firemirror.com>
Subject Re: celery problem: cannot override celery_broker_transport_options
Date Thu, 24 May 2018 15:39:18 GMT
Kombu (a library Celery uses) 4.1.0 added it back in https://github.com/celery/kombu/blob/master/Changelog#L75-L99
<https://github.com/celery/kombu/blob/master/Changelog#L75-L99> - I _thought_ that means
it's supported in Celery again...?


> On 24 May 2018, at 16:34, Craig Rodrigues <rodrigc@crodrigues.org> wrote:
> 
> Removal of sqla as a backend is mentioned in these release notes for celery
> 4.0:
> 
> http://docs.celeryproject.org/en/latest/whatsnew-4.0.html#features-removed-for-lack-of-funding
> 
> --
> Craig
> 
> 
> On Thu, May 24, 2018 at 8:32 AM Craig Rodrigues <rodrigc@crodrigues.org>
> wrote:
> 
>> It looks like in Celery, the documentation for sqla broker was removed:
>> 
>> 
>> https://github.com/celery/celery/commit/79810a26a116e9881c42a14d856fa94c40fefcd8#diff-29ccf8c96d521253467909a652e6ded2
>> 
>> I cannot find the pull request or release notes which document this.
>> 
>> --
>> Craig
>> 
>> 
>> On Thu, May 24, 2018 at 8:19 AM Ash Berlin-Taylor <
>> ash_airflowlist@firemirror.com> wrote:
>> 
>>> Sounds like
>>> https://github.com/apache/incubator-airflow/blob/v1-10-test/airflow/config_templates/default_celery.py#L31
>>> <
>>> https://github.com/apache/incubator-airflow/blob/v1-10-test/airflow/config_templates/default_celery.py#L31>
>>> should be guarded in some way to only do that for a redis:// and sqs://
>>> backends.
>>> 
>>> 
>>>> On 24 May 2018, at 16:13, Craig Rodrigues <rodrigc@crodrigues.org>
>>> wrote:
>>>> 
>>>> Ash,
>>>> 
>>>> According to this:
>>>> 
>>> http://docs.celeryproject.org/en/latest/userguide/configuration.html#broker-settings
>>>> visibility_timeout is supported by Redis and SQS.
>>>> 
>>>> --
>>>> Craig
>>>> 
>>>> 
>>>> On Thu, May 24, 2018 at 8:07 AM Craig Rodrigues <rodrigc@crodrigues.org
>>>> 
>>>> wrote:
>>>> 
>>>>> Ash,
>>>>> 
>>>>> Thanks again.  You are leading me on the right path!
>>>>> 
>>>>> I can prepare a patch to move the ssl_ options into the celery section.
>>>>> 
>>>>> What about visbility_timeout?  The error I am getting is:
>>>>> 
>>>>> File
>>> "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py",
>>>>> line 160, in create
>>>>>   engineclass.__name__))
>>>>> TypeError: Invalid argument(s)
>>>>> 'ssl_key','ssl_cert','ssl_active','visibility_timeout','ssl_cacert'
>>> sent to
>>>>> create_engine(), using configuration
>>>>> MySQLDialect_mysqldb/QueuePool/Engine.  Please check that the keyword
>>>>> arguments are appropriate for this combination of components.
>>>>> 
>>>>> 
>>>>> So it looks like visibility_timeout does not work with sqla as well.
>>>>> --
>>>>> Craig
>>>>> 
>>>>> 
>>>>> On Thu, May 24, 2018 at 2:17 AM Ash Berlin-Taylor <
>>>>> ash_airflowlist@firemirror.com> wrote:
>>>>> 
>>>>>> Yes, you would need to duplicate a chunk of the default_celery in
your
>>>>>> copy right now. But you can just make it have the values you want
-
>>> so it
>>>>>> would be about 10 lines in total.
>>>>>> 
>>>>>> It seems that between AIRFLOW-966 and AIRFLOW-1840 things got a little
>>>>>> bit out of sync in the default .cfg and the celery .py - the .py
is
>>> looking
>>>>>> for celery->ssl_* but the default config puts it in
>>>>>> celery_broker_transport_options->ssl_*. Looking at the celery
config
>>> option
>>>>>> it looks like they aren't actually options for the other transports,
>>> it's
>>>>>> just that they don't complain about the extra options. I think the
>>> fix is
>>>>>> just to move them up to the celery section.
>>>>>> 
>>>>>> -ash
>>>>>> 
>>>>>> 
>>>>>>> On 24 May 2018, at 07:21, Craig Rodrigues <rodrigc@crodrigues.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Ash,
>>>>>>> 
>>>>>>> Thanks!   You put me on the right track.
>>>>>>> Unfortunately, there is a lot of logic in
>>>>>>> airflow/config_templates/default_celery.py that I need,
>>>>>>> and if I was to come up with my own class to replace:
>>>>>>> 
>>>>>>> celery_config_options =
>>>>>>> airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG
>>>>>>> 
>>>>>>> then I would basically have to rewrite my own version of
>>>>>> default_celery.py
>>>>>>> 
>>>>>>> Also, in airflow/config_templates/default_airflow.cfg, there
is this:
>>>>>>> 
>>>>>>> [celery_broker_transport_options]
>>>>>>> # The visibility timeout defines the number of seconds to wait
for
>>> the
>>>>>>> worker
>>>>>>> # to acknowledge the task before the message is redelivered to
>>> another
>>>>>>> worker.
>>>>>>> # Make sure to increase the visibility timeout to match the time
of
>>> the
>>>>>>> longest
>>>>>>> # ETA you're planning to use. Especially important in case of
using
>>>>>> Redis
>>>>>>> or SQS
>>>>>>> visibility_timeout = 21600
>>>>>>> 
>>>>>>> # In case of using SSL
>>>>>>> ssl_active = False
>>>>>>> ssl_key =
>>>>>>> ssl_cert =
>>>>>>> ssl_cacert =
>>>>>>> 
>>>>>>> 
>>>>>>> None of those options work if using an sqla backend for celery
>>>>>> broker_url.
>>>>>>> 
>>>>>>> I need to think about this, but this needs to be cleaned up before
>>>>>>> Airflow 1.10 is released.  In
>>>>>> airflow/config_templates/default_airflow.cfg
>>>>>>> there is this:
>>>>>>> 
>>>>>>> broker_url = sqla+mysql://airflow:airflow@localhost:3306/airflow
>>>>>>> 
>>>>>>> 
>>>>>>> So sqla is specified as the default broker_url for celery.
>>>>>>> A lot of people (including where I work) have used this template
to
>>> set
>>>>>> up
>>>>>>> Airflow + Celery, and even though sqla is an "experimental"
>>> broker_url,
>>>>>>> it actually works pretty well.
>>>>>>> 
>>>>>>> Now in airflow 1.10, something that was very easy to set up is
now
>>>>>> really
>>>>>>> complicated and unintuitive.
>>>>>>> 
>>>>>>> Would it be OK to change the airflow code so that
>>>>>>> in airflow/config_templates/default_airflow.cfg, all the options
in
>>>>>>> [celery_broker_transport_options] are commented out?
>>>>>>> And if someone is running Redis, they would have to add those
>>>>>>> options in their own airflow.cfg file?
>>>>>>> 
>>>>>>> Bolke, do you have any comments?
>>>>>>> 
>>>>>>> --
>>>>>>> Craig
>>>>>>> 
>>>>>>> On Tue, May 22, 2018 at 1:50 AM Ash Berlin-Taylor <
>>>>>>> ash_airflowlist@firemirror.com> wrote:
>>>>>>> 
>>>>>>>> To use with the SQLA backend to celery you need to override
the
>>> options
>>>>>>>> Airflow passes to Celery. Those come from
>>>>>>>> 
>>>>>> 
>>> https://github.com/apache/incubator-airflow/blob/v1-10-test/airflow/config_templates/default_celery.py
>>>>>>>> 
>>>>>>>> Since you don't want most/all of those options (and there
is no way
>>> in
>>>>>> the
>>>>>>>> config file to _remove_ a setting) you will have to point
airflow
>>> to a
>>>>>>>> different file for the celery config:
>>>>>>>> 
>>>>>>>> This line in the config is what you will need to change:
>>>>>>>> 
>>>>>>>>  # Import path for celery configuration options
>>>>>>>>  celery_config_options =
>>>>>>>> airflow.config_templates.default_celery.DEFAULT_CELERY_CONFIG
>>>>>>>> 
>>>>>>>> If you create something like config/celery_config.py containing:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  CELERY_CONFIG = {
>>>>>>>>      # Just the options you want to set
>>>>>>>>  }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> (config/ should exist along side your dags/ folder, and I
think it
>>>>>> should
>>>>>>>> be added to the python path already). You can then set this
in the
>>>>>> config:
>>>>>>>> 
>>>>>>>>  celery_config_options = celery_config.CELERY_CONFIG
>>>>>>>> 
>>>>>>>> That should give you complete control
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>> 
>>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message