airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-1945) Pass --autoscale to celery workers
Date Mon, 15 Oct 2018 17:04:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650483#comment-16650483
] 

ASF GitHub Bot commented on AIRFLOW-1945:
-----------------------------------------

msumit closed pull request #3989: [AIRFLOW-1945] Autoscale celery workers for airflow added
URL: https://github.com/apache/incubator-airflow/pull/3989
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py
index 675a88a63c..cfc6c6b8d6 100644
--- a/airflow/bin/cli.py
+++ b/airflow/bin/cli.py
@@ -1038,12 +1038,16 @@ def worker(args):
     from airflow.executors.celery_executor import app as celery_app
     from celery.bin import worker
 
+    autoscale = args.autoscale
+    if autoscale is None and conf.has_option("celery", "worker_autoscale"):
+        autoscale = conf.get("celery", "worker_autoscale")
     worker = worker.worker(app=celery_app)
     options = {
         'optimization': 'fair',
         'O': 'fair',
         'queues': args.queues,
         'concurrency': args.concurrency,
+        'autoscale': autoscale,
         'hostname': args.celery_hostname,
         'loglevel': conf.get('core', 'LOGGING_LEVEL'),
     }
@@ -1916,6 +1920,9 @@ class CLIFactory(object):
             ('-d', '--delete'),
             help='Delete a user',
             action='store_true'),
+        'autoscale': Arg(
+            ('-a', '--autoscale'),
+            help="Minimum and Maximum number of worker to autoscale"),
 
     }
     subparsers = (
@@ -2058,7 +2065,7 @@ class CLIFactory(object):
             'func': worker,
             'help': "Start a Celery worker node",
             'args': ('do_pickle', 'queues', 'concurrency', 'celery_hostname',
-                     'pid', 'daemon', 'stdout', 'stderr', 'log_file'),
+                     'pid', 'daemon', 'stdout', 'stderr', 'log_file', 'autoscale'),
         }, {
             'func': flower,
             'help': "Start a Celery Flower",
diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg
index b572dbb2f7..12e5a16f21 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -349,6 +349,13 @@ celery_app_name = airflow.executors.celery_executor
 # your worker box and the nature of your tasks
 worker_concurrency = 16
 
+# The minimum and maximum concurrency that will be used when starting workers with the
+# "airflow worker" command. Pick these numbers based on resources on
+# worker box and the nature of the task. If autoscale option is available worker_concurrency
+# will be ignored.
+# http://docs.celeryproject.org/en/latest/reference/celery.bin.worker.html#cmdoption-celery-worker-autoscale
+# worker_autoscale = 12,16
+
 # When you start an airflow worker, airflow starts a tiny web server
 # subprocess to serve the workers local log files to the airflow main
 # web server, who then builds pages and sends them to users. This defines


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Pass --autoscale to celery workers
> ----------------------------------
>
>                 Key: AIRFLOW-1945
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1945
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: celery, cli
>            Reporter: Michael O.
>            Assignee: Sai Phanindhra
>            Priority: Trivial
>              Labels: easyfix
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Celery supports autoscaling of the worker pool size (number of tasks that can parallelize
within one worker node).  I'd like to propose to support passing the --autoscale parameter
to {{airflow worker}}.
> Since this is a trivial change, I am not sure if there's any reason for not being supported
already.(?)
> For example
> {{airflow worker --concurrency=4}} will set a fixed pool size of 4.
> With minimal changes in [https://github.com/apache/incubator-airflow/blob/4ce4faaeae7a76d97defcf9a9d3304ac9d78b9bd/airflow/bin/cli.py#L855]
it could support
> {{airflow worker --autoscale=2,10}} to set an autoscaled pool size of 2 to 10
> Some references:
> * http://docs.celeryproject.org/en/latest/internals/reference/celery.worker.autoscale.html
> * https://github.com/apache/incubator-airflow/blob/4ce4faaeae7a76d97defcf9a9d3304ac9d78b9bd/airflow/bin/cli.py#L855



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message