mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Mahler" <benjamin.mah...@gmail.com>
Subject Re: Review Request 28838: Fixed a long-standing performance issue in libprocess' SocketManager.
Date Tue, 09 Dec 2014 20:49:28 GMT


> On Dec. 9, 2014, 7:25 p.m., Jie Yu wrote:
> > 3rdparty/libprocess/src/process.cpp, lines 299-301
> > <https://reviews.apache.org/r/28838/diff/1/?file=786461#file786461line299>
> >
> >     Consider using hashmap/hashset as it's performance critical.

Ok, I'll remove my TODO about this then.


> On Dec. 9, 2014, 7:25 p.m., Jie Yu wrote:
> > 3rdparty/libprocess/src/process.cpp, line 301
> > <https://reviews.apache.org/r/28838/diff/1/?file=786461#file786461line301>
> >
> >     This needs more comments. It's quite difficult to digest initially. Some graph
might be helpful.
> >     
> >     ```
> >         linker         ----->       linkee
> >     (ProcessBase*)     link()       (UPID)
> >     ```
> >     
> >     And ```remotes``` is for tracking remote linkees. Multiple remote linkees from
the same remote node is tracked in ```remotes``` so that we can get those remote linkees when
the socket for that node is closed.

Ok, I've updated the comment to at least show that linkers are Processes and linkees are UPIDs,
in non-graphical form.

> And remotes is for tracking remote linkees. Multiple remote linkees from the same remote
node is tracked in remotes so that we can get those remote linkees when the socket for that
node is closed.

This reads like the last sentence in the existing comment:

```
    // For remote nodes, we also need a mapping to the linkees on the
    // node, because socket closure only notifies at the node level.
```

Any other information that you're looking for?


> On Dec. 9, 2014, 7:25 p.m., Jie Yu wrote:
> > 3rdparty/libprocess/src/process.cpp, line 2102
> > <https://reviews.apache.org/r/28838/diff/1/?file=786461#file786461line2102>
> >
> >     This doesn't seem to be necessary to me? Especially if you use hashmap?

Dropped per discussion offline, we could omit this, but this was to avoid inserting and erasing
when the key is not present.

If we measure performance (jie is writing a benchmakr), perhaps we can remove this check (and
others) to make it a bit easier to read!

I'll follow up once we can measure. :)


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28838/#review64416
-----------------------------------------------------------


On Dec. 9, 2014, 8:49 p.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28838/
> -----------------------------------------------------------
> 
> (Updated Dec. 9, 2014, 8:49 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Jie Yu, and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-2182
>     https://issues.apache.org/jira/browse/MESOS-2182
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See MESOS-2182.
> 
> The iteration over the links is expensive _and_ occurs within the SocketManager's critical
section, which we think is having some bad effects blocking other calls (see the comments
in the ticket).
> 
> This change updates the socket manager to keep a bi-directional mapping between the "linkers"
and the "linkees", which means that we now only look at the relevant information when a node/process
exits.
> 
> Note that I did double lookups on the maps, this was because we do this heavily in libprocess
already. I had originally written out the code using .find() to avoid the double lookups,
but it became next to impossible to read. Let's micro-optimize later, this is a major improvement
as it is. :)
> 
> I tried to keep the additional complexity in check, let me know if there are any suggestions
to make it easier!
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/process.cpp b87ac2206548815bc992c955252567c131fe6a47 
> 
> Diff: https://reviews.apache.org/r/28838/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Manually started a master and slave across machines, to ensure exit notifications were
sent correctly.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message