aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reza Motamedi (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (AURORA-1939) Thermos landing (host) page reports incorrect CPU rates when it is busy
Date Fri, 21 Jul 2017 03:44:00 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095732#comment-16095732
] 

Reza Motamedi edited comment on AURORA-1939 at 7/21/17 3:43 AM:
----------------------------------------------------------------

Following [~StephanErb]'s suggestion tried to guard the psutil's `oneshot` as a critical section.
It does not seem to work however:


{code}
...
from threading import Lock
...

oneshot_lock = Lock()

def process_to_sample(process):
  """ Given a psutil.Process, return a current ProcessSample """
  try:
    with oneshot_lock:
      with process.oneshot():
        # the nonblocking get_cpu_percent call is stateful on a particular Process object,
and hence
        # >2 consecutive calls are required before it will return a non-zero value
        rate = process.cpu_percent(0.0) / 100.0
        cpu_times = process.cpu_times()
...
{code}

Still getting odd readings from psutil.


was (Author: rezam):
Following [~StephanErb]'s suggestion tried to guard the psutil's `oneshot` as a critical section.
It does not seem to work however:


{code}
...
from threading import Lock
...

oneshot_lock = Lock()

def process_to_sample(process):
  """ Given a psutil.Process, return a current ProcessSample """
  try:
    with oneshot_lock:
      with process.oneshot():
        # the nonblocking get_cpu_percent call is stateful on a particular Process object,
and hence
        # >2 consecutive calls are required before it will return a non-zero value
        rate = process.cpu_percent(0.0) / 100.0
        cpu_times = process.cpu_times()
...
{code}


> Thermos landing (host) page reports incorrect CPU rates when it is busy
> -----------------------------------------------------------------------
>
>                 Key: AURORA-1939
>                 URL: https://issues.apache.org/jira/browse/AURORA-1939
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Reza Motamedi
>            Priority: Minor
>
> Thermos Observer uses `psutil` to monitor resource consumption of Thermos Processes.
On a busy machine, I have noticed negative CPU values when visiting the Thermos landing page.
> In my test I reproduced this by starting many processes that constantly create short
lived children. This indicates that in time between `process_collector_psutil` looks up the
Process children and the time it calculates the CPU time the pid of the child is actually
reused by another much younger process, which leads to negative CPU times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message