mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-7748) Slow subscribers of streaming APIs can lead to Mesos master OOM event.
Date Thu, 13 Jul 2017 20:43:00 GMT

     [ https://issues.apache.org/jira/browse/MESOS-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benjamin Mahler updated MESOS-7748:
-----------------------------------
        Summary: Slow subscribers of streaming APIs can lead to Mesos master OOM event.  (was:
Streaming API subscribers can lead to Mesos master OOM event.)
    Description: 
For each active subscriber, Mesos master / slave maintains an event queue, which grows over
time if the subscriber does not read fast enough. As the number of such "slow" subscribers
grows, so does Mesos master / slave memory consumption, which might lead to an OOM event.

Ideas to consider:
* Restrict the number of subscribers for the streaming APIs
* Check (ping) for inactive or "slow" subscribers
* Disconnect the subscriber when there are too many queued events in memory

  was:
For each active subscriber, Mesos master maintains an event queue, which grows over time if
the subscriber does not read fast enough. As the number of such "slow" subscribers grows,
so does Mesos master memory consumption, which might lead to an OOM event.

Ideas to consider:
* Restrict the number of subscribers for the streaming API
* Check (ping) for inactive or "slow" subscribers
* Disconnect the subscriber when there are too many queued events in memory


Edited the ticket to reflect that this problem is present on all streaming APIs, which includes
the master's operator and scheduler APIs, as well as the agent's streaming API for getting
container stdout/stderr. Any others I'm missing?

> Slow subscribers of streaming APIs can lead to Mesos master OOM event.
> ----------------------------------------------------------------------
>
>                 Key: MESOS-7748
>                 URL: https://issues.apache.org/jira/browse/MESOS-7748
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Alexander Rukletsov
>            Assignee: Alexander Rukletsov
>            Priority: Critical
>              Labels: mesosphere, reliability
>
> For each active subscriber, Mesos master / slave maintains an event queue, which grows
over time if the subscriber does not read fast enough. As the number of such "slow" subscribers
grows, so does Mesos master / slave memory consumption, which might lead to an OOM event.
> Ideas to consider:
> * Restrict the number of subscribers for the streaming APIs
> * Check (ping) for inactive or "slow" subscribers
> * Disconnect the subscriber when there are too many queued events in memory



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message