couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: [DISCUSS] Improve load shedding by enforcing timeouts throughout stack
Date Fri, 26 Apr 2019 19:30:00 GMT
Hi all,

The point I’m on is that we should take advantage of this extra bit of information that
we acquire out-of-band (e.g. we just decide as a project that all operations take less than
5 seconds) and come up with smarter / cheaper / faster ways of doing load shedding based on
that information.

For example, yes it could be interesting to use is_process_alive/1 to see if a client is still
hanging around, and have the gen_server discard the work otherwise. It might also be too expensive
to matter; I’m not sure anyone here has a good a priori sense of the cost of that call.
But I’d certainly wager it’s more expensive than calling timer:now_diff/2 in the server
and discarding any requests that were submitted more than 5 seconds ago.

Most of our timeout / cleanup solutions to date have been focused top-down, without making
any assumptions about the behavior of the workers or servers underneath. I think we should
try to approach this problem bottoms-up, forcing every call to complete within 5 seconds and
handling timeouts correctly as they bubble up.

Adam

> On Apr 23, 2019, at 2:48 PM, Nick Vatamaniuc <vatamane@gmail.com> wrote:
> 
> We don't spawn (/link) or monitor remote processes, just monitor the local
> coordinator process. That should cheaper performance-wise. It's also for
> relatively long running streaming fabric requests (changes, all_docs). But
> you're right, perhaps doing these for shorter requests (doc updates, doc
> GETs) might become noticeable. Perhaps a pool of reusable monitoring
> processes work there...
> 
> For couch_server timeouts. I wonder if we can do a simpler thing and
> inspect the `From` part of each call and if the Pid is not alive drop the
> requestor at least avoid doing any expensive processing. For casts it might
> involve sending a sender Pid in the message. That doesn't address timeouts,
> just the case where the coordinating process went away while the message
> was stuck in the long message queue.
> 
> On Mon, Apr 22, 2019 at 4:32 PM Robert Newson <rnewson@apache.org> wrote:
> 
>> My memory is fuzzy, but those items sound a lot like what happens with
>> rex, that motivated us (i.e, Adam) to build rexi, which deliberately does
>> less than the stock approach.
>> 
>> --
>>  Robert Samuel Newson
>>  rnewson@apache.org
>> 
>> On Mon, 22 Apr 2019, at 18:33, Nick Vatamaniuc wrote:
>>> Hi everyone,
>>> 
>>> We partially implement the first part (cleaning rexi workers) for all
>>> the
>>> fabric streaming requests. Which should be all_docs, changes, view map,
>>> view reduce:
>>> 
>> https://github.com/apache/couchdb/commit/632f303a47bd89a97c831fd0532cb7541b80355d
>>> 
>>> The pattern there is the following:
>>> 
>>> - With every request spawn a monitoring process that is in charge of
>>> keeping track of all the workers as they are spawned.
>>> - If regular cleanup takes place, then this monitoring process is
>> killed,
>>> to avoid sending double the number of kill messages to workers.
>>> - If the coordinating process doesn't run cleanup and just dies, the
>>> monitoring process will performs cleanup on its behalf.
>>> 
>>> Cheers,
>>> -Nick
>>> 
>>> 
>>> 
>>> On Thu, Apr 18, 2019 at 5:16 PM Robert Samuel Newson <rnewson@apache.org
>>> 
>>> wrote:
>>> 
>>>> My view is a) the server was unavailable for this request due to all
>> the
>>>> other requests it’s currently dealing with b) the connection was not
>> idle,
>>>> the client is not at fault.
>>>> 
>>>> B.
>>>> 
>>>>> On 18 Apr 2019, at 22:03, Done Collectively <sansato@inator.biz>
>> wrote:
>>>>> 
>>>>> Any reason 408 would be undesirable?
>>>>> 
>>>>> https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408
>>>>> 
>>>>> 
>>>>> On Thu, Apr 18, 2019 at 10:37 AM Robert Newson <rnewson@apache.org>
>>>> wrote:
>>>>> 
>>>>>> 503 imo.
>>>>>> 
>>>>>> --
>>>>>> Robert Samuel Newson
>>>>>> rnewson@apache.org
>>>>>> 
>>>>>> On Thu, 18 Apr 2019, at 18:24, Adam Kocoloski wrote:
>>>>>>> Yes, we should. Currently it’s a 500, maybe there’s something
more
>>>>>> appropriate:
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/apache/couchdb/blob/8ef42f7241f8788afc1b6e7255ce78ce5d5ea5c3/src/chttpd/src/chttpd.erl#L947-L949
>>>>>>> 
>>>>>>> Adam
>>>>>>> 
>>>>>>>> On Apr 18, 2019, at 12:50 PM, Joan Touzet <wohali@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>> What happens when it turns out the client *hasn't* timed
out and
>> we
>>>>>>>> just...hang up on them? Should we consider at least trying
to send
>>>> back
>>>>>>>> some sort of HTTP status code?
>>>>>>>> 
>>>>>>>> -Joan
>>>>>>>> 
>>>>>>>> On 2019-04-18 10:58, Garren Smith wrote:
>>>>>>>>> I'm +1 on this. With partition queries, we added a few
more
>> timeouts
>>>>>> that
>>>>>>>>> can be enabled which Cloudant enable. So having the ability
to
>> shed
>>>>>> old
>>>>>>>>> requests when these timeouts get hit would be great.
>>>>>>>>> 
>>>>>>>>> Cheers
>>>>>>>>> Garren
>>>>>>>>> 
>>>>>>>>> On Tue, Apr 16, 2019 at 2:41 AM Adam Kocoloski <
>> kocolosk@apache.org>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi all,
>>>>>>>>>> 
>>>>>>>>>> For once, I’m coming to you with a topic that is
not strictly
>> about
>>>>>>>>>> FoundationDB :)
>>>>>>>>>> 
>>>>>>>>>> CouchDB offers a few config settings (some of them
>> undocumented) to
>>>>>> put a
>>>>>>>>>> limit on how long the server is allowed to take to
generate a
>>>>>> response. The
>>>>>>>>>> trouble with many of these timeouts is that, when
they fire,
>> they do
>>>>>> not
>>>>>>>>>> actually clean up all of the work that they initiated.
A couple
>> of
>>>>>> examples:
>>>>>>>>>> 
>>>>>>>>>> - Each HTTP response coordinated by the “fabric”
application
>> spawns
>>>>>>>>>> several ephemeral processes via “rexi" on different
nodes in the
>>>>>> cluster to
>>>>>>>>>> retrieve data and send it back to the process coordinating
the
>>>>>> response. If
>>>>>>>>>> the request timeout fires, the coordinating process
will be
>> killed
>>>>>> off, but
>>>>>>>>>> the ephemeral workers might not be. In a healthy
cluster they’ll
>>>>>> exit on
>>>>>>>>>> their own when they finish their jobs, but there
are conditions
>>>>>> under which
>>>>>>>>>> they can sit around for extended periods of time
waiting for an
>>>>>> overloaded
>>>>>>>>>> gen_server (e.g. couch_server) to respond.
>>>>>>>>>> 
>>>>>>>>>> - Those named gen_servers (like couch_server) responsible
for
>>>>>> serializing
>>>>>>>>>> access to important data structures will dutifully
process
>> messages
>>>>>>>>>> received from old requests without any regard for
(of even
>> knowledge
>>>>>> of)
>>>>>>>>>> the fact that the client that sent the message timed
out long
>> ago.
>>>>>> This can
>>>>>>>>>> lead to a sort of death spiral in which the gen_server
is
>> ultimately
>>>>>>>>>> spending ~all of its time serving dead clients and
every client
>> is
>>>>>> timing
>>>>>>>>>> out.
>>>>>>>>>> 
>>>>>>>>>> I’d like to see us introduce a documented maximum
request
>> duration
>>>>>> for all
>>>>>>>>>> requests except the _changes feed, and then use that
>> information to
>>>>>> aid in
>>>>>>>>>> load shedding throughout the stack. We can audit
the codebase
>> for
>>>>>>>>>> gen_server calls with long timeouts (I know of a
few on the
>> critical
>>>>>> path
>>>>>>>>>> that set their timeouts to `infinity`) and we can
design servers
>>>> that
>>>>>>>>>> efficiently drop old requests, knowing that the client
who made
>> the
>>>>>> request
>>>>>>>>>> must have timed out. A couple of topics for discussion:
>>>>>>>>>> 
>>>>>>>>>> - the “gen_server that sheds old requests” is
a very generic
>>>>>> pattern, one
>>>>>>>>>> that seems like it could be well-suited to its own
behaviour. A
>>>>>> cursory
>>>>>>>>>> search of the internet didn’t turn up any prior
art here, which
>>>>>> surprises
>>>>>>>>>> me a bit. I’m wondering if this is worth bringing
up with the
>>>> broader
>>>>>>>>>> Erlang community.
>>>>>>>>>> 
>>>>>>>>>> - setting and enforcing timeouts is a healthy pattern
for
>> read-only
>>>>>>>>>> requests as it gives a lot more feedback to clients
about the
>> health
>>>>>> of the
>>>>>>>>>> server. When it comes to updates things are a little
bit more
>> muddy,
>>>>>> just
>>>>>>>>>> because there remains a chance that an update can
be committed,
>> but
>>>>>> the
>>>>>>>>>> caller times out before learning of the successful
commit. We
>> should
>>>>>> try to
>>>>>>>>>> minimize the likelihood of that occurring.
>>>>>>>>>> 
>>>>>>>>>> Cheers, Adam
>>>>>>>>>> 
>>>>>>>>>> P.S. I did say that this wasn’t _strictly_ about
FoundationDB,
>> but
>>>> of
>>>>>>>>>> course FDB has a hard 5 second limit on all transactions,
so it
>> is a
>>>>>> bit of
>>>>>>>>>> a forcing function :).Even putting FoundationDB aside,
I would
>> still
>>>>>> argue
>>>>>>>>>> to pursue this path based on our Ops experience with
the current
>>>>>> codebase.
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 


Mime
View raw message