From dev-return-448-apmail-openwhisk-dev-archive=openwhisk.apache.org@openwhisk.apache.org Mon May 1 22:24:08 2017 Return-Path: X-Original-To: apmail-openwhisk-dev-archive@minotaur.apache.org Delivered-To: apmail-openwhisk-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A8A6319464 for ; Mon, 1 May 2017 22:24:08 +0000 (UTC) Received: (qmail 33724 invoked by uid 500); 1 May 2017 22:24:08 -0000 Delivered-To: apmail-openwhisk-dev-archive@openwhisk.apache.org Received: (qmail 33675 invoked by uid 500); 1 May 2017 22:24:08 -0000 Mailing-List: contact dev-help@openwhisk.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@openwhisk.apache.org Delivered-To: mailing list dev@openwhisk.apache.org Received: (qmail 33606 invoked by uid 99); 1 May 2017 22:24:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 May 2017 22:24:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id ADF3F18FF2F for ; Mon, 1 May 2017 22:24:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.13 X-Spam-Level: *** X-Spam-Status: No, score=3.13 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, HTML_OBFUSCATE_05_10=0.001, KAM_LINEPADDING=1.2, KAM_LOTSOFHASH=0.25, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id s8mrWWzm74zx for ; Mon, 1 May 2017 22:24:04 +0000 (UTC) Received: from mail-qt0-f181.google.com (mail-qt0-f181.google.com [209.85.216.181]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 917F75F36F for ; Mon, 1 May 2017 22:24:03 +0000 (UTC) Received: by mail-qt0-f181.google.com with SMTP id m36so98926611qtb.0 for ; Mon, 01 May 2017 15:24:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=z/qjtdQXx3VU91Fi0EX2VDDWgJkvSRCkVQ6Dj7cyhu0=; b=n+T6Qfczz330+tfGQXt8/de/VlTv66lPJNDlGqzj9EjNB4vYeJVxlFRaYZgj916rPu 3fVsd7adOqntrssvhprEwoTc5hX4tDbXGRzxf0IChhxj7AXcPYKqu/eYg6TDj+fR1ESz pFgMl83Dq9wUwy8064hfmPhMMHY0zSn8kbAzHTi2HnA0NvnvIahLXs0lbkzvOYCH/COB Nd011Lyb0hFceQ0H2hU0ochpPBxc5R27YGt2cuHYG0hEa5FTobH7roNkc96u5uEOM2MH 2wOla2Oi4CfzVUZVrxYYCXgpXfJ2lrenTpjqQ5UgbpTx8Hp/z2FGb2R9ZssUiJ1/HHGr fhTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=z/qjtdQXx3VU91Fi0EX2VDDWgJkvSRCkVQ6Dj7cyhu0=; b=o6WR1ovBHEoNeU4ZSXFX0ZBVXD20MWCbpu0p9ttshN9JRMESLqMyyaxGwORCpCSmYo HK1D0n0UTlBABxb51Q8NWTmbOPA1F8CZxYiL6go5wBC98A4qMJ6C0VtzVITgMp+wH/DJ POiWI1YXuqe6z7sdZTeNDyl6Rq/mxToP5sqe3OSWJxWQ91ng8z+OgpsW14Clm82eXRJD ymu+IQ8NBM1bKDgvkXO5Pgo9gSSmZS3bob3tGaYMbp6M9PTnadUNTro2d6gGhlDt66WV 2QPUpVGFpvCRFlwkovTHIZOsPXNwvXbRe7W6XjelVfHT7kABkjx/5G6z7RbimyanpndY pNEg== X-Gm-Message-State: AN3rC/5N7SIkBPQ++vZUW8Yd5hfiOVqJnvbjFJm9kVzxO/lvbD0nOfHe kzfHL696N+DKIBxwdqE7PsQ6yzeLB4Vj X-Received: by 10.237.51.162 with SMTP id v31mr15019752qtd.195.1493677442252; Mon, 01 May 2017 15:24:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.128.135 with HTTP; Mon, 1 May 2017 15:24:01 -0700 (PDT) In-Reply-To: <8D03436F-44AE-45D7-BFD9-899A488A9101@adobe.com> References: <329273DD-0D47-4303-A91C-68BCF63A9716@adobe.com> <8D03436F-44AE-45D7-BFD9-899A488A9101@adobe.com> From: Nick Mitchell Date: Mon, 1 May 2017 18:24:01 -0400 Message-ID: Subject: Re: concurrent requests on actions To: dev@openwhisk.apache.org Content-Type: multipart/alternative; boundary=94eb2c124a325ac7dc054e7de1fb --94eb2c124a325ac7dc054e7de1fb Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable won't this only be of benefit for invocations that are mostly sleepy? e.g. I/O-bound? because if an action uses CPU flat-out, then there is no throughput win to be had (by increasing the parallelism of CPU-bound processes), given the small CPU sliver that each container gets -- unless there is a concomitant increase in concurrency, i.e. CPU slice? if so, then my gut tells me that there are more general solutions to this (i.e. more efficient packing of I/O-bound processes) On Mon, May 1, 2017 at 5:36 PM, Tyson Norris wrote: > Thanks Markus. > > Can you direct me to the travis job where I can see the 40+RPS? I agree > that is a big gap and would like to take a look - I didn=E2=80=99t see an= ything in > https://travis-ci.org/openwhisk/openwhisk/builds/226918375 ; maybe I=E2= =80=99m > looking in the wrong place. > > I will work on putting together a PR to discuss. > > Thanks > Tyson > > > On May 1, 2017, at 2:22 PM, Markus Th=C3=B6mmes markusthoemmes@me.com>> wrote: > > Hi Tyson, > > Sounds like you did a lot of investigation here, thanks a lot for that :) > > Seeing the numbers, 4 RPS in the "off" case seem very odd. The Travis > build that runs the current system as is also reaches 40+ RPS. So we'd ne= ed > to look at a mismatch here. > > Other than that I'd indeed suspect a great improvement in throughput from > your work! > > Implementationwise I don't have a strong opionion but it might be worth t= o > discuss the details first and land your impl. once all my staging is done > (the open PRs). That'd ease git operation. If you want to discuss your > impl. now I suggest you send a PR to my new-containerpool branch and shar= e > the diff here for discussion. > > Cheers, > Markus > > Von meinem iPhone gesendet > > Am 01.05.2017 um 23:16 schrieb Tyson Norris ris@adobe.com>>: > > Hi Michael - > Concurrent requests would only reuse a running/warm container for > same-action requests. So if the action has bad/rogue behavior, it will > limit its own throughput only, not the throughput of other actions. > > This is ignoring the current implementation of the activation feed, which > I guess is susceptible to a flood of slow running activations. If those > activations are for the same action, running concurrently should be enoug= h > to not starve the system for other activations (with faster actions) to b= e > processed. In case they are all different actions, OR not allowed to > execute concurrently, then in the name of quality-of-service, it may also > be desirable to reserve some resources (i.e. separate activation feeds) f= or > known-to-be-faster actions, so that fast-running actions are not penalize= d > for existing alongside the slow-running actions. This would require a mor= e > complicated throughput test to demonstrate. > > Thanks > Tyson > > > > > > > > On May 1, 2017, at 1:13 PM, Michael Marth h@adobe.com>> wrote: > > Hi Tyson, > > 10x more throughput, i.e. Being able to run OW at 1/10 of the cost - > definitely worth looking into :) > > Like Rodric mentioned before I figured some features might become more > complex to implement, like billing, log collection, etc. But given such a > huge advancement on throughput that would be worth it IMHO. > One thing I wonder about, though, is resilience against rogue actions. If > an action is blocking (in the Node-sense, not the OW-sense), would that n= ot > block Node=E2=80=99s event loop and thus block other actions in that cont= ainer? One > could argue, though, that this rogue action would only block other > executions of itself, not affect other actions or customers. WDYT? > > Michael > > > > > On 01/05/17 17:54, "Tyson Norris" ris@adobe.com>> wrote: > > Hi All - > I created this issue some time ago to discuss concurrent requests on > actions: [1] Some people mentioned discussing on the mailing list so I > wanted to start that discussion. > > I=E2=80=99ve been doing some testing against this branch with Markus=E2= =80=99s work on the > new container pool: [2] > I believe there are a few open PRs in upstream related to this work, but > this seemed like a reasonable place to test against a variety of the > reactive invoker and pool changes - I=E2=80=99d be interested to hear if = anyone > disagrees. > > Recently I ran some tests > - with =E2=80=9Cthroughput.sh=E2=80=9D in [3] using concurrency of 10 (it= will also be > interesting to test with the --rps option in loadtest...) > - using a change that checks actions for an annotation =E2=80=9Cmax-concu= rrent=E2=80=9D > (in case there is some reason actions want to enforce current behavior of > strict serial invocation per container?) > - when scheduling an actions against the pool, if there is a currently > =E2=80=9Cbusy=E2=80=9D container with this action, AND the annotation is = present for this > action, AND concurrent requests < max-concurrent, the this container is > used to invoke the action > > Below is a summary (approx 10x throughput with concurrent requests) and I > would like to get some feedback on: > - what are the cases for having actions that require container isolation > per request? node is a good example that should NOT need this, but maybe > there are cases where it is more important, e.g. if there are cases where > stateful actions are used? > - log collection approach: I have not attempted to resolve log collection > issues; I would expect that revising the log sentinel marker to include t= he > activation ID would help, and logs stored with the activation would inclu= de > interleaved activations in some cases (which should be expected with > concurrent request processing?), and require some different logic to > process logs after an activation completes (e.g. logs emitted at the star= t > of an activation may have already been collected as part of another > activation=E2=80=99s log collection, etc). > - advice on creating a PR to discuss this in more detail - should I wait > for more of the container pooling changes to get to master? Or submit a P= R > to Markus=E2=80=99s new-containerpool branch? > > Thanks > Tyson > > Summary of loadtest report with max-concurrent ENABLED (I used 10000, but > this limit wasn=E2=80=99t reached): > [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Target URL: > https://na01.safelinks.protection.outlook.com/?url=3D > https%3A%2F%2F192.168.99.100%2Fapi%2Fv1%2Fnamespaces%2F_%2Factions% > 2FnoopThroughputConcurrent%3Fblocking%3Dtrue&data=3D02%7C01%7C% > 7C796dfc317cde44c9e83908d490ce7faa%7Cfa7b1b5a7b34438794aed2c178de > cee1%7C0%7C0%7C636292663971484169&sdata=3Duv9kYh5uBoIDXDlEivgMClJ6TDGDmz > TdKOgZPZjkBko%3D&reserved=3D0 > [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Max requests: 10000 > [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Concurrency level: 10 > [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Agent: > keepalive > [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO > [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Completed requests: 10000 > [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Total errors: 0 > [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Total time: > 241.900480915 s > [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Requests per second: 41 > [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Mean latency: 241.7 > ms > > Summary of loadtest report with max-concurrent DISABLED: > [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Target URL: > https://na01.safelinks.protection.outlook.com/?url=3D > https%3A%2F%2F192.168.99.100%2Fapi%2Fv1%2Fnamespaces%2F_% > 2Factions%2FnoopThroughput%3Fblocking%3Dtrue&data=3D02%7C01%7C% > 7C796dfc317cde44c9e83908d490ce7faa%7Cfa7b1b5a7b34438794aed2c178de > cee1%7C0%7C0%7C636292663971494178&sdata=3Dh6sMS0s2WQXFMcLg8sSAq%2F56p% > 2F%2BmVmth%2B%2FsqTOVmeAc%3D&reserved=3D0 > [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Max requests: 10000 > [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Concurrency level: 10 > [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Agent: > keepalive > [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO > [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Completed requests: 10000 > [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Total errors: 19 > [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Total time: > 2770.658048791 s > [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Requests per second: 4 > [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Mean latency: 2767.= 3 > ms > > > > > > [1] https://na01.safelinks.protection.outlook.com/?url=3D > https%3A%2F%2Fgithub.com%2Fopenwhisk%2Fopenwhisk% > 2Fissues%2F2026&data=3D02%7C01%7C%7C796dfc317cde44c9e83908d490ce7faa% > 7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636292663971494178&sdata=3De= g% > 2FsSPRQYapQHPNbfMLCW%2B%2F1yAqn8zSo0nJ5yQjmkns%3D&reserved=3D0 > [2] https://na01.safelinks.protection.outlook.com/?url=3D > https%3A%2F%2Fgithub.com%2Fmarkusthoemmes%2Fopenwhisk% > 2Ftree%2Fnew-containerpool&data=3D02%7C01%7C%7C796dfc317cde44c9e83908d490= ce > 7faa%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0% > 7C636292663971494178&sdata=3DIZcN9szW71SdL%2ByssJm9k3EgzaU4b5idI5yFWyR7% > 2BL4%3D&reserved=3D0 > [3] https://na01.safelinks.protection.outlook.com/?url=3D > https%3A%2F%2Fgithub.com%2Fmarkusthoemmes%2Fopenwhisk- > performance&data=3D02%7C01%7C%7C796dfc317cde44c9e83908d490ce7faa% > 7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636292663971494178&sdata=3D > WkOlhTsplKQm6mUkZtwWLXzCrQg%2FUmKtqOErIw6gFAA%3D&reserved=3D0 > > > --94eb2c124a325ac7dc054e7de1fb--