From dev-return-48546-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Fri Apr 26 20:19:52 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 5340D18064C for ; Fri, 26 Apr 2019 22:19:52 +0200 (CEST) Received: (qmail 69807 invoked by uid 500); 26 Apr 2019 20:19:51 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 69793 invoked by uid 99); 26 Apr 2019 20:19:51 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2019 20:19:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 90F0218006B for ; Fri, 26 Apr 2019 20:19:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.973 X-Spam-Level: X-Spam-Status: No, score=0.973 tagged_above=-999 required=6.31 tests=[SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 3LBIduhbWFRg for ; Fri, 26 Apr 2019 20:19:48 +0000 (UTC) Received: from smtp.justsomehost.net (smtp.justsomehost.net [204.11.51.157]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id DAB0A5F1BE for ; Fri, 26 Apr 2019 20:19:47 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp.justsomehost.net (Postfix) with ESMTP id D249B582055 for ; Fri, 26 Apr 2019 16:19:45 -0400 (EDT) Received: from smtp.justsomehost.net ([127.0.0.1]) by localhost (smtp.justsomehost.net [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id O9OMcDAvGBtw for ; Fri, 26 Apr 2019 16:19:44 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by smtp.justsomehost.net (Postfix) with ESMTP id 93750582052 for ; Fri, 26 Apr 2019 16:19:44 -0400 (EDT) X-Virus-Scanned: amavisd-new at smtp.justsomehost.net Received: from smtp.justsomehost.net ([127.0.0.1]) by localhost (smtp.justsomehost.net [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id e_bLsM0sY8mR for ; Fri, 26 Apr 2019 16:19:44 -0400 (EDT) Received: from [192.168.1.42] (69-196-165-48.dsl.teksavvy.com [69.196.165.48]) by smtp.justsomehost.net (Postfix) with ESMTPSA id 71F8A582051 for ; Fri, 26 Apr 2019 16:19:44 -0400 (EDT) Subject: Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack To: dev@couchdb.apache.org References: <81B09DB8-D192-4319-A3FA-4C123C206F33@apache.org> <598D778A-8029-4845-9176-D1E192422FB1@apache.org> <20672508-37b9-4da6-a728-57435042cae4@www.fastmail.com> <6652f9b5-65db-4c45-aae8-caaa1204eb05@www.fastmail.com> From: Joan Touzet Organization: Apache Software Foundation Message-ID: <55285bb8-6eed-9654-b23f-329b2e404b85@apache.org> Date: Fri, 26 Apr 2019 16:19:44 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable Hi Adam, I'll bring up a concern from a recent client with whom I engaged. They're on 1.x. On 1.x they have been doing 50k bulk update operations=20 in a single request. 1.x doesn't time out. The updates are such that=20 they guarantee that none will result in a conflict or be rejected, so=20 all 50k are accepted. They do this so it appears atomic to the next=20 reader - a read from another client can't occur in the middle of the big=20 update, because we have a single couch_file in 1.x. Obviously, in 2.x this doesn't work on two levels. First, there's=20 multiple readers and writers across a cluster, so the big bulk operation=20 doesn't act as a blocker until it's finished for any interposed reads.=20 Second, you can't reliably finish 50k updates in a single batch in a=20 cluster anyway, because you'll probably hit the fabric timeout, if not=20 other cluster timeouts. As a general rule of thumb, I advise people to keep bulk document=20 updates to no more than batches of 1k at a time, with the understanding=20 that in 2.x these are not treated as an atomic transaction (and they=20 weren't strictly that way in 1.x, either, but never mind that...) If we decide as a project that all operations must take less than 5=20 seconds, we're probably going to have to reduce the bulk update batch=20 size even further. I'm betting 100 would be the upper bound on bulk updat= es. Is this going to impose a significant performance penalty on bulk ops? -Joan On 2019-04-26 3:30 p.m., Adam Kocoloski wrote: > Hi all, >=20 > The point I=E2=80=99m on is that we should take advantage of this extra= bit of information that we acquire out-of-band (e.g. we just decide as a= project that all operations take less than 5 seconds) and come up with s= marter / cheaper / faster ways of doing load shedding based on that infor= mation. >=20 > For example, yes it could be interesting to use is_process_alive/1 to s= ee if a client is still hanging around, and have the gen_server discard t= he work otherwise. It might also be too expensive to matter; I=E2=80=99m = not sure anyone here has a good a priori sense of the cost of that call. = But I=E2=80=99d certainly wager it=E2=80=99s more expensive than calling = timer:now_diff/2 in the server and discarding any requests that were subm= itted more than 5 seconds ago. >=20 > Most of our timeout / cleanup solutions to date have been focused top-d= own, without making any assumptions about the behavior of the workers or = servers underneath. I think we should try to approach this problem bottom= s-up, forcing every call to complete within 5 seconds and handling timeou= ts correctly as they bubble up. >=20 > Adam >=20 >> On Apr 23, 2019, at 2:48 PM, Nick Vatamaniuc wrot= e: >> >> We don't spawn (/link) or monitor remote processes, just monitor the l= ocal >> coordinator process. That should cheaper performance-wise. It's also f= or >> relatively long running streaming fabric requests (changes, all_docs).= But >> you're right, perhaps doing these for shorter requests (doc updates, d= oc >> GETs) might become noticeable. Perhaps a pool of reusable monitoring >> processes work there... >> >> For couch_server timeouts. I wonder if we can do a simpler thing and >> inspect the `From` part of each call and if the Pid is not alive drop = the >> requestor at least avoid doing any expensive processing. For casts it = might >> involve sending a sender Pid in the message. That doesn't address time= outs, >> just the case where the coordinating process went away while the messa= ge >> was stuck in the long message queue. >> >> On Mon, Apr 22, 2019 at 4:32 PM Robert Newson wro= te: >> >>> My memory is fuzzy, but those items sound a lot like what happens wit= h >>> rex, that motivated us (i.e, Adam) to build rexi, which deliberately = does >>> less than the stock approach. >>> >>> -- >>> Robert Samuel Newson >>> rnewson@apache.org >>> >>> On Mon, 22 Apr 2019, at 18:33, Nick Vatamaniuc wrote: >>>> Hi everyone, >>>> >>>> We partially implement the first part (cleaning rexi workers) for al= l >>>> the >>>> fabric streaming requests. Which should be all_docs, changes, view m= ap, >>>> view reduce: >>>> >>> https://github.com/apache/couchdb/commit/632f303a47bd89a97c831fd0532c= b7541b80355d >>>> >>>> The pattern there is the following: >>>> >>>> - With every request spawn a monitoring process that is in charge of >>>> keeping track of all the workers as they are spawned. >>>> - If regular cleanup takes place, then this monitoring process is >>> killed, >>>> to avoid sending double the number of kill messages to workers. >>>> - If the coordinating process doesn't run cleanup and just dies, the >>>> monitoring process will performs cleanup on its behalf. >>>> >>>> Cheers, >>>> -Nick >>>> >>>> >>>> >>>> On Thu, Apr 18, 2019 at 5:16 PM Robert Samuel Newson >>> >>>> wrote: >>>> >>>>> My view is a) the server was unavailable for this request due to al= l >>> the >>>>> other requests it=E2=80=99s currently dealing with b) the connectio= n was not >>> idle, >>>>> the client is not at fault. >>>>> >>>>> B. >>>>> >>>>>> On 18 Apr 2019, at 22:03, Done Collectively >>> wrote: >>>>>> >>>>>> Any reason 408 would be undesirable? >>>>>> >>>>>> https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408 >>>>>> >>>>>> >>>>>> On Thu, Apr 18, 2019 at 10:37 AM Robert Newson >>>>> wrote: >>>>>> >>>>>>> 503 imo. >>>>>>> >>>>>>> -- >>>>>>> Robert Samuel Newson >>>>>>> rnewson@apache.org >>>>>>> >>>>>>> On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote: >>>>>>>> Yes, we should. Currently it=E2=80=99s a 500, maybe there=E2=80=99= s something more >>>>>>> appropriate: >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78= ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949 >>>>>>>> >>>>>>>> Adam >>>>>>>> >>>>>>>>> On Apr 18, 2019, at 12:50 PM, Joan Touzet >>> wrote: >>>>>>>>> >>>>>>>>> What happens when it turns out the client *hasn't* timed out an= d >>> we >>>>>>>>> just...hang up on them? Should we consider at least trying to s= end >>>>> back >>>>>>>>> some sort of HTTP status code? >>>>>>>>> >>>>>>>>> -Joan >>>>>>>>> >>>>>>>>> On 2019-04-18 10:58, Garren Smith wrote: >>>>>>>>>> I'm +1 on this. With partition queries, we added a few more >>> timeouts >>>>>>> that >>>>>>>>>> can be enabled which Cloudant enable. So having the ability to >>> shed >>>>>>> old >>>>>>>>>> requests when these timeouts get hit would be great. >>>>>>>>>> >>>>>>>>>> Cheers >>>>>>>>>> Garren >>>>>>>>>> >>>>>>>>>> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski < >>> kocolosk@apache.org> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> For once, I=E2=80=99m coming to you with a topic that is not = strictly >>> about >>>>>>>>>>> FoundationDB :) >>>>>>>>>>> >>>>>>>>>>> CouchDB offers a few config settings (some of them >>> undocumented) to >>>>>>> put a >>>>>>>>>>> limit on how long the server is allowed to take to generate a >>>>>>> response. The >>>>>>>>>>> trouble with many of these timeouts is that, when they fire, >>> they do >>>>>>> not >>>>>>>>>>> actually clean up all of the work that they initiated. A coup= le >>> of >>>>>>> examples: >>>>>>>>>>> >>>>>>>>>>> - Each HTTP response coordinated by the =E2=80=9Cfabric=E2=80= =9D application >>> spawns >>>>>>>>>>> several ephemeral processes via =E2=80=9Crexi" on different n= odes in the >>>>>>> cluster to >>>>>>>>>>> retrieve data and send it back to the process coordinating th= e >>>>>>> response. If >>>>>>>>>>> the request timeout fires, the coordinating process will be >>> killed >>>>>>> off, but >>>>>>>>>>> the ephemeral workers might not be. In a healthy cluster they= =E2=80=99ll >>>>>>> exit on >>>>>>>>>>> their own when they finish their jobs, but there are conditio= ns >>>>>>> under which >>>>>>>>>>> they can sit around for extended periods of time waiting for = an >>>>>>> overloaded >>>>>>>>>>> gen_server (e.g. couch_server) to respond. >>>>>>>>>>> >>>>>>>>>>> - Those named gen_servers (like couch_server) responsible for >>>>>>> serializing >>>>>>>>>>> access to important data structures will dutifully process >>> messages >>>>>>>>>>> received from old requests without any regard for (of even >>> knowledge >>>>>>> of) >>>>>>>>>>> the fact that the client that sent the message timed out long >>> ago. >>>>>>> This can >>>>>>>>>>> lead to a sort of death spiral in which the gen_server is >>> ultimately >>>>>>>>>>> spending ~all of its time serving dead clients and every clie= nt >>> is >>>>>>> timing >>>>>>>>>>> out. >>>>>>>>>>> >>>>>>>>>>> I=E2=80=99d like to see us introduce a documented maximum req= uest >>> duration >>>>>>> for all >>>>>>>>>>> requests except the _changes feed, and then use that >>> information to >>>>>>> aid in >>>>>>>>>>> load shedding throughout the stack. We can audit the codebase >>> for >>>>>>>>>>> gen_server calls with long timeouts (I know of a few on the >>> critical >>>>>>> path >>>>>>>>>>> that set their timeouts to `infinity`) and we can design serv= ers >>>>> that >>>>>>>>>>> efficiently drop old requests, knowing that the client who ma= de >>> the >>>>>>> request >>>>>>>>>>> must have timed out. A couple of topics for discussion: >>>>>>>>>>> >>>>>>>>>>> - the =E2=80=9Cgen_server that sheds old requests=E2=80=9D is= a very generic >>>>>>> pattern, one >>>>>>>>>>> that seems like it could be well-suited to its own behaviour.= A >>>>>>> cursory >>>>>>>>>>> search of the internet didn=E2=80=99t turn up any prior art h= ere, which >>>>>>> surprises >>>>>>>>>>> me a bit. I=E2=80=99m wondering if this is worth bringing up = with the >>>>> broader >>>>>>>>>>> Erlang community. >>>>>>>>>>> >>>>>>>>>>> - setting and enforcing timeouts is a healthy pattern for >>> read-only >>>>>>>>>>> requests as it gives a lot more feedback to clients about the >>> health >>>>>>> of the >>>>>>>>>>> server. When it comes to updates things are a little bit more >>> muddy, >>>>>>> just >>>>>>>>>>> because there remains a chance that an update can be committe= d, >>> but >>>>>>> the >>>>>>>>>>> caller times out before learning of the successful commit. We >>> should >>>>>>> try to >>>>>>>>>>> minimize the likelihood of that occurring. >>>>>>>>>>> >>>>>>>>>>> Cheers, Adam >>>>>>>>>>> >>>>>>>>>>> P.S. I did say that this wasn=E2=80=99t _strictly_ about Foun= dationDB, >>> but >>>>> of >>>>>>>>>>> course FDB has a hard 5 second limit on all transactions, so = it >>> is a >>>>>>> bit of >>>>>>>>>>> a forcing function :).Even putting FoundationDB aside, I woul= d >>> still >>>>>>> argue >>>>>>>>>>> to pursue this path based on our Ops experience with the curr= ent >>>>>>> codebase. >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>>> >>>> >>> >=20