aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Erb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1939) Thermos landing (host) page reports incorrect CPU rates when it is busy
Date Sun, 23 Jul 2017 20:53:00 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097798#comment-16097798
] 

Stephan Erb commented on AURORA-1939:
-------------------------------------

This is now on master. Thanks for the patch!

{code}
commit cdc5b8efd5bb86d38f73cca6d91903078b120333
Author: Reza Motamedi reza.motamedi@gmail.com
Date:   Sat Jul 22 20:28:50 2017 +0200

Remove psutil's oneshot

After a lot of testing on busy machines, I realized that psutil's oneshot is
not threadsafe. I contacted the developer however, have not recevied a conceret
fix.

Please read https://issues.apache.org/jira/browse/AURORA-1939 and
https://github.com/giampaolo/psutil/issues/1110 for more information.

These inconsistencies disappear after removing oneshot.

Reviewed at https://reviews.apache.org/r/61016/

src/main/python/apache/thermos/monitoring/process_collector_psutil.py | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)
{code}

> Thermos landing (host) page reports incorrect CPU rates when it is busy
> -----------------------------------------------------------------------
>
>                 Key: AURORA-1939
>                 URL: https://issues.apache.org/jira/browse/AURORA-1939
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Reza Motamedi
>            Assignee: Reza Motamedi
>            Priority: Minor
>
> Thermos Observer uses `psutil` to monitor resource consumption of Thermos Processes.
On a busy machine, I have noticed negative CPU values when visiting the Thermos landing page.
> In my test I reproduced this by starting many processes that constantly create short
lived children. This indicates that in time between `process_collector_psutil` looks up the
Process children and the time it calculates the CPU time the pid of the child is actually
reused by another much younger process, which leads to negative CPU times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message