airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Standish <dpstand...@gmail.com>
Subject Re: Airflow Dynamic tasks
Date Thu, 29 Aug 2019 20:13:20 GMT
Eugene

Why don't you create a stack overflow post and give us the link?

That is probably a better way to help you through this.

We will need to see what exactly you are doing in your dag files and
potentially also hooks / operators.

Thanks


On Thu, Aug 29, 2019 at 10:41 AM Bacal, Eugene <ebacal@paypal.com.invalid>
wrote:

> Can someone advise if this is expecting behavior, please?
>
> - DB connections are not being re-used
>         - Connections stay open while active only 5:
>                 mysql>  show global status like 'Thread%';
>                 +-------------------------+---------+
>                 | Variable_name           | Value   |
>                 +-------------------------+---------+
>                 | Threadpool_idle_threads | 0       |
>                 | Threadpool_threads      | 0       |
>                 | Threads_cached          | 775     |
>                 | Threads_connected       | 5323    |
>                 | Threads_created         | 4846609 |
>                 | Threads_running         | 5       |
>                 +-------------------------+---------+
>         - Workers create hungreds of conections that’s remains open until
> DB cleares them (900 sec)
>
>
>
> Eugene Bacal
>
> Workload Automation Engineer
> Scheduling Platform Experience
> Cell: 4802028764
> Slack: @ebacal
> DL: dl-pp-cdp-spx@paypal.com
>
>
> Scottsdale, Arizona
>
>
> On 8/21/19, 2:36 PM, "Bacal, Eugene" <ebacal@paypal.com.INVALID> wrote:
>
>     Hi Max,
>
>     We have ran few testing today from DB side and noticed that:
>
>         - DB connections are not being re-used
>         - Connections stay open while active only 5:
>                 mysql>  show global status like 'Thread%';
>                 +-------------------------+---------+
>                 | Variable_name           | Value   |
>                 +-------------------------+---------+
>                 | Threadpool_idle_threads | 0       |
>                 | Threadpool_threads      | 0       |
>                 | Threads_cached          | 775     |
>                 | Threads_connected       | 5323    |
>                 | Threads_created         | 4846609 |
>                 | Threads_running         | 5       |
>                 +-------------------------+---------+
>         - Workers create hungreds of conections that’s remains open until
> DB cleares them (900 sec)
>
>     Seems that similar cases were reported some time back(2016),
>
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FAIRFLOW-680&amp;data=01%7C01%7Cebacal%40paypal.com%7Cde99ce488b3e479020ca08d7267f9b30%7Cfb00791460204374977e21bac5f3f4c8%7C1&amp;sdata=YXWNRhht76O1djebOfm0oEwAWanKQNLaW0KWLWrrxms%3D&amp;reserved=0
>
>
>     On 8/20/19, 5:19 PM, "Bacal, Eugene" <ebacal@paypal.com.INVALID>
> wrote:
>
>         Celery executor.
>         12 BareMetals: CPU(s):40, MHz 2494.015, RAM 378G - each box
>
>         Worker .cfg:
>         [core]
>         sql_alchemy_pool_size = 5
>         sql_alchemy_pool_recycle = 900
>         sql_alchemy_reconnect_timeout = 300
>         parallelism = 1200
>         dag_concurrency = 800
>         non_pooled_task_slot_count = 1200
>         max_active_runs_per_dag = 10
>         dagbag_import_timeout = 30
>         [celery]
>         worker_concurrency = 100
>
>         Scheduler .cfg:
>         [core]
>         sql_alchemy_pool_size = 30
>         sql_alchemy_pool_recycle = 300
>         sql_alchemy_reconnect_timeout = 300
>         parallelism = 1200
>         dag_concurrency = 800
>         non_pooled_task_slot_count = 1200
>         max_active_runs_per_dag = 10
>         [scheduler]
>         job_heartbeat_sec = 5
>         scheduler_heartbeat_sec = 5
>         run_duration = 1800
>         min_file_process_interval = 10
>         min_file_parsing_loop_time = 1
>         dag_dir_list_interval = 300
>         print_stats_interval = 30
>         scheduler_zombie_task_threshold = 300
>         max_tis_per_query = 1024
>         max_threads = 29
>
>
>         From workers I see 350 + connections at the start time, then it
> drops to 200 and then to 1-10 once tasks complete
>         From Scheduler very low 1-10:
>         MySQL connections:
>             331 worker1
>             215 worker2
>             349 worker53
>             335 worker54
>             347 worker55
>             336 worker56
>             336 worker57
>             354 worker58
>             339 worker59
>             328 worker60
>             333 worker61
>             337 worker62
>              2 scheduler
>
>
>
>         - Eugene
>
>
>         On 8/20/19, 8:51 AM, "Maxime Beauchemin" <
> maximebeauchemin@gmail.com> wrote:
>
>             Delay between tasks could be due to not having enough worker
> slots. What
>             type of executor are you using, how is it configured?
>
>             Max
>
>             On Tue, Aug 20, 2019 at 7:50 AM Bacal, Eugene
> <ebacal@paypal.com.invalid>
>             wrote:
>
>             > Absolutely possible, Daniel,
>             >
>             > We are looking in all directions. Has anyone noticed
> performance
>             > improvements with PostgreSQL vs MySQL ?
>             >
>             > -Eugene
>             >
>             >
>             > On 8/15/19, 2:03 PM, "Daniel Standish" <dpstandish@gmail.com>
> wrote:
>             >
>             >     It's not just webserver and scheduler that will parse
> your dag file.
>             >     During the execution of a dag run, dag file will be
> re-parsed at the
>             > start
>             >     of every task instance.  If you have 1000 tasks running
> in short
>             > period of
>             >     time, that's 1000 queries.  It's possible these queries
> are piling up
>             > in a
>             >     queue on your database.  Dag read time has to be very
> fast for this
>             > reason.
>             >
>             >
>             >
>             >     On Thu, Aug 15, 2019 at 1:45 PM Bacal, Eugene
>             > <ebacal@paypal.com.invalid>
>             >     wrote:
>             >
>             >     >
>             >     > Thank you for your reply, Max
>             >     >
>             >     > Dynamic DAGs query the database for tables and
> generates DAGs and
>             > tasks
>             >     > based on the output.
>             >     > For Python does not take much to execute:
>             >     >
>             >     > Dynamic - 500 tasks:
>             >     > time python PPAD_OIS_MASTER_IDI.py
>             >     > [2019-08-15 12:57:48,522] {settings.py:174} INFO -
>             >     > setting.configure_orm(): Using pool settings.
> pool_size=30,
>             > pool_recycle=300
>             >     > real    0m1.830s
>             >     > user    0m1.622s
>             >     > sys     0m0.188s
>             >     >
>             >     >
>             >     > Static - 100 tasks:
>             >     > time python PPAD_OPS_CANARY_CONNECTIONS_TEST_8.py
>             >     > [2019-08-15 12:59:24,959] {settings.py:174} INFO -
>             >     > setting.configure_orm(): Using pool settings.
> pool_size=30,
>             > pool_recycle=300
>             >     > real    0m1.009s
>             >     > user    0m0.898s
>             >     > sys     0m0.108s
>             >     >
>             >     >
>             >     > We have 44 DAGs with 1003 Dynamic tasks. Parsing in
> quite time:
>             >     > DagBag parsing time: 3.9385959999999995
>             >     >
>             >     > Parsing in time of execution, when scheduler submits
> the DAGs:
>             >     > DagBag parsing time: 99.820316
>             >     >
>             >     > Delay between the task run inside a single DAG grow
> from 30 sec to
>             > 10 min,
>             >     > then it drops back even thou tasks are runnign.
>             >     >
>             >     > Eugene
>             >     >
>             >     >
>             >     >
>             >     >
>             >     >
>             >     > On 8/15/19, 11:52 AM, "Maxime Beauchemin" <
>             > maximebeauchemin@gmail.com>
>             >     > wrote:
>             >     >
>             >     >     What is your dynamic DAG doing? How long does it
> take to execute
>             > it
>             >     > just as
>             >     >     a python script (`time python mydag.py`)?
>             >     >
>             >     >     As an Airflow admin, people may want to lower the
> DAG parsing
>             > timeout
>             >     >     configuration key to force people to not do crazy
> thing in DAG
>             > module
>             >     >     scope. At some point at Airbnb we had someone
> running a Hive
>             > query in
>             >     > DAG
>             >     >     scope, clearly that needs to be prevented.
>             >     >
>             >     >     Loading DAGs by calling a database can bring all
> sorts of
>             > surprises
>             >     > that
>             >     >     can drive everyone crazy. As mentioned in a recent
> post,
>             >     > repo-contained,
>             >     >     deterministic "less dynamic" DAGs are great,
> because they are
>             >     >     self-contained and allow you to use source-control
> properly
>             > (revert a
>             >     > bad
>             >     >     change for instance). That may mean having a
> process or script
>             > that
>             >     >     compiles external things that are dynamic into
> things like yaml
>             > files
>             >     >     checked into the code repo. Things as simple as
> parsing duration
>             > become
>             >     >     more predictable (network latency and database
> load are not part
>             > of
>             >     > that
>             >     >     equation), but more importantly, all changes
> become tracked in
>             > the code
>             >     >     repo.
>             >     >
>             >     >     yaml parsing in python can be pretty slow too, and
> there are
>             > solutions
>             >     > /
>             >     >     alternatives there. Hocon is great. Also
> C-accelerated yaml is
>             >     > possible:
>             >     >
>             >     >
>             >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&amp;data=01%7C01%7Cebacal%40paypal.com%7Cde99ce488b3e479020ca08d7267f9b30%7Cfb00791460204374977e21bac5f3f4c8%7C1&amp;sdata=HnqMESbF%2F6DiLbt0fCP%2ByJmF27P4Yovq%2Bx57oeLz0C4%3D&amp;reserved=0
>             >     >
>             >     >     Max
>             >     >
>             >     >     On Wed, Aug 14, 2019 at 9:56 PM Bacal, Eugene
>             >     > <ebacal@paypal.com.invalid>
>             >     >     wrote:
>             >     >
>             >     >     > Hello Airflow team,
>             >     >     >
>             >     >     > Please advise if you can. In our environment, we
> have noticed
>             > that
>             >     > dynamic
>             >     >     > tasks place quite of stress on scheduler,
> webserver and
>             > increase
>             >     > MySQL DB
>             >     >     > connections.
>             >     >     > We are run about 1000 Dynamic Tasks every 30 min
> and parsing
>             > time
>             >     >     > increases from 5 to 65 sec with Runtime from
> 2sec to 350+ .
>             > This
>             >     > happens at
>             >     >     > execution time then it drops to normal while
> still executing
>             > tasks.
>             >     >     > Webserver hangs for few minutes.
>             >     >     >
>             >     >     > Airflow 1.10.1.
>             >     >     > MySQL DB
>             >     >     >
>             >     >     > Example:
>             >     >     >
>             >     >     > Dynamic Tasks:
>             >     >     > Number of DAGs: 44
>             >     >     > Total task number: 950
>             >     >     > DagBag parsing time: 65.879642000000001
>             >     >     >
>             >     >     > Static Tasks:
>             >     >     > Number of DAGs: 73
>             >     >     > Total task number: 1351
>             >     >     > DagBag parsing time: 1.731088
>             >     >     >
>             >     >     > Is this something you aware of? Any advises on
> Dynamic tasks
>             >     >     > optimization/best practices?
>             >     >     >
>             >     >     > Thank you in advance,
>             >     >     > Eugene
>             >     >     >
>             >     >     >
>             >     >     >
>             >     >
>             >     >
>             >     >
>             >
>             >
>             >
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message