mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-7376) Long registry updates when the number of agents is high
Date Mon, 17 Apr 2017 23:41:41 GMT

    [ https://issues.apache.org/jira/browse/MESOS-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971824#comment-15971824
] 

Benjamin Mahler commented on MESOS-7376:
----------------------------------------

Yes, I will shepherd, thanks for taking this on!

> Long registry updates when the number of agents is high
> -------------------------------------------------------
>
>                 Key: MESOS-7376
>                 URL: https://issues.apache.org/jira/browse/MESOS-7376
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 1.3.0
>            Reporter: Ilya Pronin
>            Assignee: Ilya Pronin
>            Priority: Critical
>
> During scale testing we discovered that as the number of registered agents grows the
time it takes to update the registry grows to unacceptable values very fast. At some point
it starts exceeding {{registry_store_timeout}} which doesn't fire.
> With 55k agents we saw this ({{registry_store_timeout=20secs}}):
> {noformat}
> I0331 17:11:21.227442 36472 registrar.cpp:473] Applied 69 operations in 3.138843387secs;
attempting to update the registry
> I0331 17:11:24.441409 36464 log.cpp:529] LogStorage.set: acquired the lock in 74461ns
> I0331 17:11:24.441541 36464 log.cpp:543] LogStorage.set: started in 51770ns
> I0331 17:11:26.869323 36462 log.cpp:628] LogStorage.set: wrote append at position=6420881
in 2.41043644secs
> I0331 17:11:26.869454 36462 state.hpp:179] State.store: storage.set has finished in 2.428189561secs
(b=1)
> I0331 17:11:56.199453 36469 registrar.cpp:518] Successfully updated the registry in 34.971944192secs
> {noformat}
> This is caused by repeated {{Registry}} copying which involves copying a big object graph
that takes roughly 0.4 sec (with 55k agents).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message