aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reza Motamedi <reza.motam...@gmail.com>
Subject Re: Review Request 60376: Observer task page to load consumption info from history
Date Tue, 18 Jul 2017 06:26:33 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/
-----------------------------------------------------------

(Updated July 18, 2017, 6:26 a.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.


Changes
-------

Addressed review feedbacks. Added one more test.


Repository: aurora


Description
-------

# Observer task page to load consumption info from history

Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor
threads (one thread per Thermos task). This information is used to display a (semi) fresh
state of the tasks running on a host in the Observer host page, aka landing page. An aggregate
history of the consumptions is kept at the task level, although TaskResourceMonitor needs
to first collect the resource at the Process level and then aggregate them.

On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos
Processes within that task are calculated again and displayed without being aggregated. This
can become very slow since time to complete resource calculation is affected by the load on
the host.

By applying this patch we take advantage of the periodic work and fulfill information resource
requested in Observer task page from already collected resource consumptions.


Diffs (updated)
-----

  src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2

  src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35

  src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28



Diff: https://reviews.apache.org/r/60376/diff/4/

Changes: https://reviews.apache.org/r/60376/diff/3-4/


Testing
-------

I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not
need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer
is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es
is enough to make the Observer slow; 11secs to load `task page`.


File Attachments
----------------

without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
  https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
  https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png


Thanks,

Reza Motamedi


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message