mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-2507) Performance issue in the master when a large number of slaves are registering.
Date Tue, 19 May 2015 01:57:59 GMT

    [ https://issues.apache.org/jira/browse/MESOS-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549654#comment-14549654
] 

Benjamin Mahler commented on MESOS-2507:
----------------------------------------

https://reviews.apache.org/r/34387/
https://reviews.apache.org/r/34388/
https://reviews.apache.org/r/34389/

> Performance issue in the master when a large number of slaves are registering.
> ------------------------------------------------------------------------------
>
>                 Key: MESOS-2507
>                 URL: https://issues.apache.org/jira/browse/MESOS-2507
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>            Reporter: Benjamin Mahler
>            Assignee: Benjamin Mahler
>              Labels: scalability, twitter
>
> For large clusters, when a lot of slaves are registering, the master gets backlogged
processing registration requests. {{perf}} revealed the following:
> {code}
> Events: 14K cycles
>  25.44%  libmesos-0.22.0-x.so  [.] mesos::internal::master::Master::registerSlave(process::UPID
const&, mesos::SlaveInfo const&, std::vector<mesos::Resource, std::allocator<mesos::Resource>
> cons
>  11.18%  libmesos-0.22.0-x.so  [.] pipecb
>   5.88%  libc-2.5.so             [.] malloc_consolidate
>   5.33%  libc-2.5.so             [.] _int_free
>   5.25%  libc-2.5.so             [.] malloc
>   5.23%  libc-2.5.so             [.] _int_malloc
>   4.11%  libstdc++.so.6.0.8      [.] std::string::assign(std::string const&)
>   3.22%  libmesos-0.22.0-x.so  [.] mesos::Resource::SharedDtor()
>   3.10%  [kernel]                [k] _raw_spin_lock
>   1.97%  libmesos-0.22.0-x.so  [.] mesos::Attribute::SharedDtor()
>   1.28%  libc-2.5.so             [.] memcmp
>   1.08%  libc-2.5.so             [.] free
> {code}
> This is likely because we loop over all the slaves for each registration:
> {code}
> void Master::registerSlave(
>     const UPID& from,
>     const SlaveInfo& slaveInfo,
>     const vector<Resource>& checkpointedResources,
>     const string& version)
> {
>   // ...
>   // Check if this slave is already registered (because it retries).
>   foreachvalue (Slave* slave, slaves.registered) {
>     if (slave->pid == from) {
>       // ...
>     }
>   }
>   // ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message