airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiran Gavali (BLOOMBERG/ 120 PARK)" <kgava...@bloomberg.net>
Subject Re:Increased DB connection usage 1.8.0 -> 1.10.1
Date Fri, 08 Mar 2019 16:20:16 GMT
I missed an important fact that connection pooling was not enabled in 1.8.1 (at least in our
config). When upgrading to 1.10.1, we didn't change the following defaults:
sql_alchemy_pool_enabled = True
sql_alchemy_pool_size = 5

After tweaking these, we are able to control the connection usage.

From: Kiran Gavali (BLOOMBERG/ 120 PARK) At: 02/08/19 16:01:37To:  dev@airflow.apache.org
Cc:  Atharva Deshmukh (BLOOMBERG/ 120 PARK ) ,  Rabita Sarker (BLOOMBERG/ 120 PARK ) ,  Michelle
Noronha (BLOOMBERG/ 120 PARK ) ,  Tero Paananen (BLOOMBERG/ 120 PARK ) 
Subject: Increased DB connection usage 1.8.0 -> 1.10.1

Hi there,


Issue:
Would love to get pointers on an issue we have been seeing after we upgraded our airflow installation
from 1.8.0 to 1.10.1. The configuration we use is the same across these versions but we see
task failures due to number of DB connections being used up. The failures are mainly when
the scheduler tries to build a new DAG. The exceptions that we see are (attached sample stack
trace):

- psycopg2.OperationalError: FATAL: too many connections for role xxx
- sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: remaining connection
slots are reserved for non-replication superuser connections

Info:
Below are the settings that seem relevant to this behavior (also attaching our config file):
--------------------
sql_alchemy_pool_size = 5
sql_alchemy_pool_recycle = 3600
sql_alchemy_reconnect_timeout = 300
parallelism = 32
dag_concurrency = 16
dags_are_paused_at_creation = True
non_pooled_task_slot_count = 128
max_active_runs_per_dag = 16
workers = 4
scheduler_zombie_task_threshold = 300
-----------

Setup:
We use postgres as the DB backend and connection limit for Airflow user has been set to 100.
Below is how airflow components are setup:

Node 1: Worker(8), webserver, scheduler 
Node 2: Worker(8), webserver
Node 3: Worker(8)
Node 4: Worker(8)

We could not find anything in commits, JIRA and dev mailing list which could point to why
Airflow 1.10.1 would start using more connections vs Airflow 1.8.0. The only commit that seemed
related in 1.10.2 is https://github.com/apache/airflow/commit/959dd619d19223db3709fa4abcf52e8ee98bc079.
Since, we don't know the root cause of this behavior, not sure if upgrading to 1.10.2 is going
to help. Is there a way to estimate the number of connections that can be used based on the
configuration and setup? Or perhaps identifying the settings that can significantly affect
it. Any help is greatly appreciated.
Regards,
Kiran


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message