mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Yu (JIRA)" <>
Subject [jira] [Created] (MESOS-9307) Libprocess should have a way to detect stuck actor.
Date Thu, 11 Oct 2018 05:09:00 GMT
Jie Yu created MESOS-9307:

             Summary: Libprocess should have a way to detect stuck actor.
                 Key: MESOS-9307
             Project: Mesos
          Issue Type: Improvement
          Components: libprocess
            Reporter: Jie Yu

We spent two days on a bug, which turns out to be an infinite loop in an actor, blocking other
events from being processed by that actor.

Currently, the only way to know about a stuck agent is to use gdb. We should think about a
way to print error logs when an actor has stuck for more than a threshold.

For instance, Linux kernel will print a warning in kernel log if a task is stuck for more
than 120 seconds. Something like this will be extremely helpful.

Another way is to expose some metrics around this.

This message was sent by Atlassian JIRA

View raw message