mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Mann (JIRA)" <>
Subject [jira] [Commented] (MESOS-9283) Docker containerizer actor can get backlogged with large number of containers.
Date Thu, 04 Oct 2018 23:32:00 GMT


Greg Mann commented on MESOS-9283:

I tested the backport to 1.5.x, and the conflicts were not bad. I squashed both of the above
patches into one, which I will merge and backport:

> Docker containerizer actor can get backlogged with large number of containers.
> ------------------------------------------------------------------------------
>                 Key: MESOS-9283
>                 URL:
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 1.5.1, 1.6.1, 1.7.0
>            Reporter: Jie Yu
>            Assignee: Greg Mann
>            Priority: Major
>              Labels: perfomance
>         Attachments: Screen Shot 2018-10-01 at 10.54.18 PM.png
> We observed during some scale testing that we do internally.
> When launching 300+ Docker containers on a single agent box, it's possible that the Docker
containerizer actor gets backlogged. As a result, API processing like `GET_CONTAINERS` will
become unresponsive. It'll also block Mesos containerizer from launching containers if one
specified `--containers=docker,mesos` because Docker containerizer launch will be invoked
first by the composing containerizer (and queued).
> Profiling results show that the bottleneck is `os::killtree`, which will be invoked when
the Docker commands are discarded (e.g., client disconnect, etc.).
> For this particular case, killtree is not really necessary because the docker command
does not fork additional subprocesses. If we use the argv version of `subprocess` to launch
docker commands, we can simply use os::kill instead. We confirmed that, by switching to os::kill,
the performance issues goes away, and the agent can easily scale up to 300+ containers.

This message was sent by Atlassian JIRA

View raw message