mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <>
Subject [jira] [Commented] (MESOS-2891) Performance regression in hierarchical allocator.
Date Sat, 20 Jun 2015 01:10:01 GMT


Vinod Kone commented on MESOS-2891:


> Performance regression in hierarchical allocator.
> -------------------------------------------------
>                 Key: MESOS-2891
>                 URL:
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation, master
>            Reporter: Benjamin Mahler
>            Assignee: Jie Yu
>            Priority: Blocker
>              Labels: twitter
>             Fix For: 0.23.0
>         Attachments: Screen Shot 2015-06-18 at 5.02.26 PM.png, perf-kernel.svg
> For large clusters, the 0.23.0 allocator cannot keep up with the volume of slaves. After
the following slave was re-registered, it took the allocator a long time to work through the
backlog of slaves to add:
> {noformat:title=45 minute delay}
> I0618 18:55:40.738399 10172 master.cpp:3419] Re-registered slave 20150422-211121-2148346890-5050-3253-S4695
> I0618 19:40:14.960636 10164 hierarchical.hpp:496] Added slave 20150422-211121-2148346890-5050-3253-S4695
> {noformat}
> Empirically, [addSlave|]
and [updateSlave|]
have become expensive.
> Some timings from a production cluster reveal that the allocator spending in the low
tens of milliseconds for each call to {{addSlave}} and {{updateSlave}}, when there are tens
of thousands of slaves this amounts to the large delay seen above.
> We also saw a slow steady increase in memory consumption, hinting further at a queue
backup in the allocator.
> A synthetic benchmark like we did for the registrar would be prudent here, along with
visibility into the allocator's queue size.

This message was sent by Atlassian JIRA

View raw message