aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Erb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1907) Thermos unresponsive on hosts with many active task
Date Sun, 19 Mar 2017 15:03:41 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931728#comment-15931728
] 

Stephan Erb commented on AURORA-1907:
-------------------------------------

First patch submitted:

{code}
commit b8f72d1461c2f13f1f73c13211b428f60596c11e
Author: Stephan Erb <serb@apache.org>
Date:   Sun Mar 19 16:01:50 2017 +0100

    Use Process.oneshot() in latest psutils for faster stats retrieval.

    Without the Process.oneshot() decorator stats retrieval can lead to
    multiple reads of the same `/proc` filesystem values. The oneshot
    decorator enables caching to speed this up. It has been added in
    psutils 5.0.

    Oneshot docs: https://pythonhosted.org/psutil/#psutil.Process.oneshot
    Changelog: https://github.com/giampaolo/psutil/blob/master/HISTORY.rst#520

    Bugs closed: AURORA-1907

    Reviewed at https://reviews.apache.org/r/57732/

 3rdparty/python/requirements.txt                                      |  2 +-
 src/main/python/apache/thermos/monitoring/process_collector_psutil.py | 23 ++++++++++++-----------
 2 files changed, 13 insertions(+), 12 deletions(-)
{code}

> Thermos unresponsive on hosts with many active task
> ---------------------------------------------------
>
>                 Key: AURORA-1907
>                 URL: https://issues.apache.org/jira/browse/AURORA-1907
>             Project: Aurora
>          Issue Type: Story
>          Components: Observer
>            Reporter: Stephan Erb
>            Assignee: Stephan Erb
>
> We have noticed that on hosts with lots of active tasks (~100) and many terminated tasks
(~1500) the Thermos UI is not usable. Thermos spins at 300% CPU but does not render any HTTP
requests.
> Dumping {{/threads}} indicates we might be blocked by the hundret {{TaskResourceMonitor}}
threads trying to read values from {{/proc}}:
> {code}
> # Thread (daemon): TaskResourceMonitor (TaskResourceMonitor[mytask-id] [TID=45241], 140682825963264)
>   File: "/usr/lib/python2.7/threading.py", line 525, in __bootstrap
>     self.__bootstrap_inner()
>   File: "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
>     self.run()
>   File: "/.pex/install/twitter.common.decorators-0.3.7-py2-none-any.whl.b23f2874a4392741fca582d9e0528c08e0335c68/twitter.common.decorators-0.3.7-py2-none-any.whl/twitter/common/decorators/threads.py",
line 115, in identified
>     return instancemethod(self, *args, **kwargs)
>   File: "/.pex/install/twitter.common.exceptions-0.3.7-py2-none-any.whl.f6376bcca9bfda5eba4396de2676af5dfe36237d/twitter.common.exceptions-0.3.7-py2-none-any.whl/twitter/common/exceptions/__init__.py",
line 126, in _excepting_run
>     self.__real_run(*args, **kw)
>   File: "apache/thermos/monitoring/resource.py", line 204, in run
>     collector.sample()
>   File: "apache/thermos/monitoring/process_collector_psutil.py", line 70, in sample
>     for child in parent.children(recursive=True)
>   File: "/.pex/install/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl.f4f23a781c020a8b8cb5cba2da0161d0db6452b1/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl/psutil/__init__.py",
line 326, in wrapper
>     return fun(self, *args, **kwargs)
>   File: "/.pex/install/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl.f4f23a781c020a8b8cb5cba2da0161d0db6452b1/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl/psutil/__init__.py",
line 861, in children
>     table[p.ppid()].append(p)
>   File: "/.pex/install/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl.f4f23a781c020a8b8cb5cba2da0161d0db6452b1/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl/psutil/__init__.py",
line 545, in ppid
>     return self._proc.ppid()
>   File: "/.pex/install/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl.f4f23a781c020a8b8cb5cba2da0161d0db6452b1/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl/psutil/_pslinux.py",
line 962, in wrapper
>     return fun(self, *args, **kwargs)
>   File: "/.pex/install/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl.f4f23a781c020a8b8cb5cba2da0161d0db6452b1/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl/psutil/_pslinux.py",
line 1459, in ppid
>     return int(self._parse_stat_file()[2])
>   File: "/.pex/install/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl.f4f23a781c020a8b8cb5cba2da0161d0db6452b1/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl/psutil/_pslinux.py",
line 1001, in _parse_stat_file
>     return [name] + fields_after_name
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message