aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reza Motamedi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1939) Thermos landing (host) page reports incorrect CPU rates when it is busy
Date Wed, 19 Jul 2017 20:55:00 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093761#comment-16093761
] 

Reza Motamedi commented on AURORA-1939:
---------------------------------------

I see this problem when the host is super busy, and resource collection is backlogged. In
this case I also see much more errors of this kind in the log:
{noformat}
D0719 20:18:28.064794 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62193)
D0719 20:18:35.351458 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=14711)
D0719 20:18:35.552953 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62331)
D0719 20:18:42.857400 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh
in 0.01s
D0719 20:18:43.753732 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62338)
D0719 20:18:48.454077 24474 mesos_vars.py:384] Metrics collection took 6506.1ms
D0719 20:18:50.253031 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62345)
D0719 20:18:57.861535 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh
in 0.00s
D0719 20:19:12.955235 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh
in 0.01s
D0719 20:19:14.959180 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62361)
D0719 20:19:14.960768 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62232)
D0719 20:19:18.056128 24474 mesos_vars.py:384] Metrics collection took 6008.0ms
D0719 20:19:22.856868 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62366)
D0719 20:19:28.048165 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh
in 0.09s
D0719 20:19:28.660691 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62374)
D0719 20:19:43.051047 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh
in 0.00s
D0719 20:19:48.355678 24474 mesos_vars.py:384] Metrics collection took 6299.8ms
D0719 20:19:58.148663 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh
in 0.10s
D0719 20:20:11.449485 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62271)
D0719 20:20:13.155102 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh
in 0.01s
D0719 20:20:18.249528 24474 mesos_vars.py:384] Metrics collection took 6179.9ms
D0719 20:20:23.354832 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=11317)
D0719 20:20:27.060431 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62281)
D0719 20:20:28.160298 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh
in 0.00s
D0719 20:20:35.452637 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62289)
D0719 20:20:43.252589 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh
in 0.09s
D0719 20:20:48.151144 24474 mesos_vars.py:384] Metrics collection took 6058.3ms
D0719 20:20:55.254796 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62428)
D0719 20:20:58.257311 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh
in 0.00s
D0719 20:21:13.352955 24474 task_observer.py:142] TaskObserver: finished checkpoint refresh
in 0.10s
D0719 20:21:17.555244 24474 process_collector_psutil.py:84] Error during process sampling:
psutil.NoSuchProcess process no longer exists (pid=62124)
{noformat}

> Thermos landing (host) page reports incorrect CPU rates when it is busy
> -----------------------------------------------------------------------
>
>                 Key: AURORA-1939
>                 URL: https://issues.apache.org/jira/browse/AURORA-1939
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Reza Motamedi
>            Priority: Minor
>
> Thermos Observer uses `psutil` to monitor resource consumption of Thermos Processes.
On a busy machine, I have noticed negative CPU values when visiting the Thermos landing page.
> In my test I reproduced this by starting many processes that constantly create short
lived children. This indicates that in time between `process_collector_psutil` looks up the
Process children and the time it calculates the CPU time the pid of the child is actually
reused by another much younger process, which leads to negative CPU times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message