airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <>
Subject [GitHub] [airflow] KevinYang21 commented on issue #5908: Revert "[AIRFLOW-4797] Improve performance and behaviour of zombie de…
Date Fri, 06 Sep 2019 04:08:56 GMT
KevinYang21 commented on issue #5908: Revert "[AIRFLOW-4797] Improve performance and behaviour
of zombie de…
   Thank you guys for reviewing!
   @milton0825 We benchmarked the two approaches during the initial PR 3873 with 4k DAG files
and 30k. With aggregated query the DB CPU usage is kept under 50% while with the subprocess
query the DB will be killed instantly. In our production cluster at that time, running ~20k
tasks concurrently with 2k DAG files, DB CPU went from 80% to ~40%. In our current production
DB with >23M rows in task_instance table and >4M rows in job table, average time it
takes to run the query takes 0.5 second( we have a powerful DB but the PR being reverted also
showed an average of 0.5 second runtime of that query). So it shouldn't slow down the dag
processor manager too much.
   @ashb pg_stat won't get flushed until the DB is restarted so we don't really see the diff
in frequency, but that is pretty important in the evaluation here. Even with the provided
data, query time of 25 DAG files added would already beat the joined query, not to mention
the overhead of starting/stopping the transaction.
   In general I believe it is better to use the aggregated query, thus leverage the query
optimizer, instead of trying to query ourselves. And esp. with a large scaled cluster that
has huge number of DAG files to parse, it would a show stopper if we distribute the query
to the subprocess.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message