Return-Path: X-Original-To: apmail-mesos-dev-archive@www.apache.org Delivered-To: apmail-mesos-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 195AD931F for ; Tue, 9 Dec 2014 20:49:32 +0000 (UTC) Received: (qmail 94716 invoked by uid 500); 9 Dec 2014 20:49:31 -0000 Delivered-To: apmail-mesos-dev-archive@mesos.apache.org Received: (qmail 94638 invoked by uid 500); 9 Dec 2014 20:49:31 -0000 Mailing-List: contact dev-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mesos.apache.org Delivered-To: mailing list dev@mesos.apache.org Received: (qmail 94619 invoked by uid 99); 9 Dec 2014 20:49:31 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Dec 2014 20:49:31 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id B38DD1D2300; Tue, 9 Dec 2014 20:49:28 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============8042182115166142896==" MIME-Version: 1.0 Subject: Re: Review Request 28838: Fixed a long-standing performance issue in libprocess' SocketManager. From: "Ben Mahler" To: "Joris Van Remoortere" , "Benjamin Hindman" , "Jie Yu" Cc: "Ben Mahler" , "mesos" Date: Tue, 09 Dec 2014 20:49:28 -0000 Message-ID: <20141209204928.5765.47452@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Ben Mahler" X-ReviewGroup: mesos X-ReviewRequest-URL: https://reviews.apache.org/r/28838/ X-Sender: "Ben Mahler" References: <20141209192536.26419.64707@reviews.apache.org> In-Reply-To: <20141209192536.26419.64707@reviews.apache.org> Reply-To: "Ben Mahler" X-ReviewRequest-Repository: mesos-git --===============8042182115166142896== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Dec. 9, 2014, 7:25 p.m., Jie Yu wrote: > > 3rdparty/libprocess/src/process.cpp, lines 299-301 > > > > > > Consider using hashmap/hashset as it's performance critical. Ok, I'll remove my TODO about this then. > On Dec. 9, 2014, 7:25 p.m., Jie Yu wrote: > > 3rdparty/libprocess/src/process.cpp, line 301 > > > > > > This needs more comments. It's quite difficult to digest initially. Some graph might be helpful. > > > > ``` > > linker -----> linkee > > (ProcessBase*) link() (UPID) > > ``` > > > > And ```remotes``` is for tracking remote linkees. Multiple remote linkees from the same remote node is tracked in ```remotes``` so that we can get those remote linkees when the socket for that node is closed. Ok, I've updated the comment to at least show that linkers are Processes and linkees are UPIDs, in non-graphical form. > And remotes is for tracking remote linkees. Multiple remote linkees from the same remote node is tracked in remotes so that we can get those remote linkees when the socket for that node is closed. This reads like the last sentence in the existing comment: ``` // For remote nodes, we also need a mapping to the linkees on the // node, because socket closure only notifies at the node level. ``` Any other information that you're looking for? > On Dec. 9, 2014, 7:25 p.m., Jie Yu wrote: > > 3rdparty/libprocess/src/process.cpp, line 2102 > > > > > > This doesn't seem to be necessary to me? Especially if you use hashmap? Dropped per discussion offline, we could omit this, but this was to avoid inserting and erasing when the key is not present. If we measure performance (jie is writing a benchmakr), perhaps we can remove this check (and others) to make it a bit easier to read! I'll follow up once we can measure. :) - Ben ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28838/#review64416 ----------------------------------------------------------- On Dec. 9, 2014, 8:49 p.m., Ben Mahler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28838/ > ----------------------------------------------------------- > > (Updated Dec. 9, 2014, 8:49 p.m.) > > > Review request for mesos, Benjamin Hindman, Jie Yu, and Joris Van Remoortere. > > > Bugs: MESOS-2182 > https://issues.apache.org/jira/browse/MESOS-2182 > > > Repository: mesos-git > > > Description > ------- > > See MESOS-2182. > > The iteration over the links is expensive _and_ occurs within the SocketManager's critical section, which we think is having some bad effects blocking other calls (see the comments in the ticket). > > This change updates the socket manager to keep a bi-directional mapping between the "linkers" and the "linkees", which means that we now only look at the relevant information when a node/process exits. > > Note that I did double lookups on the maps, this was because we do this heavily in libprocess already. I had originally written out the code using .find() to avoid the double lookups, but it became next to impossible to read. Let's micro-optimize later, this is a major improvement as it is. :) > > I tried to keep the additional complexity in check, let me know if there are any suggestions to make it easier! > > > Diffs > ----- > > 3rdparty/libprocess/src/process.cpp b87ac2206548815bc992c955252567c131fe6a47 > > Diff: https://reviews.apache.org/r/28838/diff/ > > > Testing > ------- > > make check > > Manually started a master and slave across machines, to ensure exit notifications were sent correctly. > > > Thanks, > > Ben Mahler > > --===============8042182115166142896==--