Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E88C92009FB for ; Fri, 6 May 2016 15:30:18 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E70D6160A0C; Fri, 6 May 2016 13:30:18 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 11CCF1608F8 for ; Fri, 6 May 2016 15:30:17 +0200 (CEST) Received: (qmail 37917 invoked by uid 500); 6 May 2016 13:30:17 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 37908 invoked by uid 99); 6 May 2016 13:30:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 May 2016 13:30:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D45C91A13D3 for ; Fri, 6 May 2016 13:30:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.021 X-Spam-Level: X-Spam-Status: No, score=-4.021 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id gN-bcWBFeHJ2 for ; Fri, 6 May 2016 13:30:15 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with SMTP id CCFB05F4EA for ; Fri, 6 May 2016 13:30:13 +0000 (UTC) Received: (qmail 37905 invoked by uid 99); 6 May 2016 13:30:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 May 2016 13:30:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id DAA6D2C14F9 for ; Fri, 6 May 2016 13:30:12 +0000 (UTC) Date: Fri, 6 May 2016 13:30:12 +0000 (UTC) From: "Bence Nagy (JIRA)" To: commits@airflow.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (AIRFLOW-57) Rename concurrency configuration variables to be more clear MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 06 May 2016 13:30:19 -0000 [ https://issues.apache.org/jira/browse/AIRFLOW-57?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:all-tabpanel ] Bence Nagy updated AIRFLOW-57: ------------------------------ Description:=20 Currently the following config variables exists for controlling parallel ex= ecution limits: {code} # The amount of parallelism as a setting to the executor. This defines # the max number of task instances that should run simultaneously # on this airflow installation parallelism =3D 32 # The number of task instances allowed to run concurrently by the scheduler dag_concurrency =3D 16 # When not using pools, tasks are run in the "default pool", # whose size is guided by this config element non_pooled_task_slot_count =3D 128 # The maximum number of active DAG runs per DAG max_active_runs_per_dag =3D 16 {code} Let's go through these one by one: {{parallelism}}: not a very descriptive name, considering that all the abov= e settings are for parallelism. The description says it sets the maximum ta= sk instances for the airflow installation, which is a bit ambiguous =E2=80= =94 if I have two hosts running airflow workers, I'd have airflow installed= on two hosts, so that should be two installations, but based on context 'p= er installation' here means 'per Airflow state database'. I'd name this {{m= ax_active_tasks}}. {{dag_concurrency}}: Despite the name based on the comment this is actually= the task concurrency, and it's per worker. I'd name this {{max_active_task= s_for_worker}} ({{per_worker}} would suggest that it's a global setting for= workers, but I think you can have workers with different values set for th= is). {{non_pooled_task_slot_count}}: This is a weird one. I'm going to pass on s= uggesting a name for it because I just can't think of any reason this confi= g variable should exist. We already have a global task instance limit, and = we have pools to limit access to certain resources =E2=80=94 in what case w= ould someone want to limit access to everything other than certain resource= s? So, yeah, skipping this one. In case this was needed only due to how poo= ls are implemented, I'd suggest setting the limit to {{sys.maxsize}} and ju= st removing the config variable. {{max_active_runs_per_dag}}: This one's kinda alright, but since it seems t= o be just a default value for the matching {{DAG}} kwarg, it might be nice = to reflect that in the name, something like {{default_max_active_runs_for_d= ags}} So let's move on to the {{DAG}} kwargs: {{concurrency}}: Again, having a general name like this, coupled with the f= act that concurrency is used for something different elsewhere makes this p= retty confusing. I'd call this {{max_active_tasks}}. {{max_active_runs}}: This one sounds alright to me. So. If people agree that this is something that should be fixed, I think it= 'd be nice to get this in the 1.7.1 release, especially considering that it= should be really easy to make the change backwards compatible. was: Currently the following config variables exists for controlling parallel ex= ecution limits: {code} # The amount of parallelism as a setting to the executor. This defines # the max number of task instances that should run simultaneously # on this airflow installation parallelism =3D 32 # The number of task instances allowed to run concurrently by the scheduler dag_concurrency =3D 16 # When not using pools, tasks are run in the "default pool", # whose size is guided by this config element non_pooled_task_slot_count =3D 128 # The maximum number of active DAG runs per DAG max_active_runs_per_dag =3D 16 {code} Let's go through these one by one: {{parallelism}}: not a very descriptive name, considering that all the abov= e settings are for parallelism. The description says it sets the maximum ta= sk instances for the airflow installation, which is a bit ambiguous =E2=80= =94 if I have two hosts running airflow workers, I'd have airflow installed= on two hosts, so that should be two installations, but based on context 'p= er installation' here means 'per Airflow state database'. I'd name this {{m= ax_active_tasks}}. {{dag_concurrency}}: Despite the name based on the comment this is actually= the task concurrency, and it's per worker. I'd name this {{max_active_task= s_for_worker}} ({{per_worker}} would suggest that it's a global setting for= workers, but I think you can have workers with different values set for th= is). {{non_pooled_task_slot_count}}: This is a weird one. I'm going to pass on s= uggesting a name for it because I just can't think of any reason this confi= g variable should exist. We already have a global task instance limit, and = we have pools to limit access to certain resources =E2=80=94 in what case w= ould someone want to limit access to everything other than certain resource= s? So, yeah, skipping this one. {{max_active_runs_per_dag}}: This one's kinda alright, but since it seems t= o be just a default value for the matching {{DAG}} kwarg, it might be nice = to reflect that in the name, something like {{default_max_active_runs_for_d= ags}} So let's move on to the {{DAG}} kwargs: {{concurrency}}: Again, having a general name like this, coupled with the f= act that concurrency is used for something different elsewhere makes this p= retty confusing. I'd call this {{max_active_tasks}}. {{max_active_runs}}: This one sounds alright to me. So. If people agree that this is something that should be fixed, I think it= 'd be nice to get this in the 1.7.1 release, especially considering that it= should be really easy to make the change backwards compatible. > Rename concurrency configuration variables to be more clear > ----------------------------------------------------------- > > Key: AIRFLOW-57 > URL: https://issues.apache.org/jira/browse/AIRFLOW-57 > Project: Apache Airflow > Issue Type: Improvement > Affects Versions: Airflow 1.7.0 > Reporter: Bence Nagy > Priority: Minor > Labels: newbie > > Currently the following config variables exists for controlling parallel = execution limits: > {code} > # The amount of parallelism as a setting to the executor. This defines > # the max number of task instances that should run simultaneously > # on this airflow installation > parallelism =3D 32 > # The number of task instances allowed to run concurrently by the schedul= er > dag_concurrency =3D 16 > # When not using pools, tasks are run in the "default pool", > # whose size is guided by this config element > non_pooled_task_slot_count =3D 128 > # The maximum number of active DAG runs per DAG > max_active_runs_per_dag =3D 16 > {code} > Let's go through these one by one: > {{parallelism}}: not a very descriptive name, considering that all the ab= ove settings are for parallelism. The description says it sets the maximum = task instances for the airflow installation, which is a bit ambiguous =E2= =80=94 if I have two hosts running airflow workers, I'd have airflow instal= led on two hosts, so that should be two installations, but based on context= 'per installation' here means 'per Airflow state database'. I'd name this = {{max_active_tasks}}. > {{dag_concurrency}}: Despite the name based on the comment this is actual= ly the task concurrency, and it's per worker. I'd name this {{max_active_ta= sks_for_worker}} ({{per_worker}} would suggest that it's a global setting f= or workers, but I think you can have workers with different values set for = this). > {{non_pooled_task_slot_count}}: This is a weird one. I'm going to pass on= suggesting a name for it because I just can't think of any reason this con= fig variable should exist. We already have a global task instance limit, an= d we have pools to limit access to certain resources =E2=80=94 in what case= would someone want to limit access to everything other than certain resour= ces? So, yeah, skipping this one. In case this was needed only due to how p= ools are implemented, I'd suggest setting the limit to {{sys.maxsize}} and = just removing the config variable. > {{max_active_runs_per_dag}}: This one's kinda alright, but since it seems= to be just a default value for the matching {{DAG}} kwarg, it might be nic= e to reflect that in the name, something like {{default_max_active_runs_for= _dags}} > So let's move on to the {{DAG}} kwargs: > {{concurrency}}: Again, having a general name like this, coupled with the= fact that concurrency is used for something different elsewhere makes this= pretty confusing. I'd call this {{max_active_tasks}}. > {{max_active_runs}}: This one sounds alright to me. > So. If people agree that this is something that should be fixed, I think = it'd be nice to get this in the 1.7.1 release, especially considering that = it should be really easy to make the change backwards compatible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)