couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Vander Wilt (JIRA)" <>
Subject [jira] [Commented] (COUCHDB-1682) Allow filtered _changes to time out, returning last_seq processed
Date Fri, 30 Jan 2015 22:13:34 GMT


Nathan Vander Wilt commented on COUCHDB-1682:

Since I'm logged in and sullying the bug tracker anyway, I will note that this is a final
nail in the coffin for scaling filtered changes/replication:

1. A handful of users/topics/channels/whatever all trying to filter a database full of changes
2. The filter workers are all re-processing the same documents in the context of a different
request, slowing each other down
3. Load grows a bit more, and these queries start timing out.
4. Guess what, the client still wants the data, so it retries…

So now we have a situation! Instead of making forward progress, the system pretty much gets
into a escalating loop of "everyone's job times out 75% through, so they all start back at
the beginning" until the clients' backoff interval [if there is any] is long enough to reduce
the load to a point where a few can get to a checkpoint.

Workaround: implement own changes feed logic in yet more middleware, atop a `local_seq:true`
view that pre-sorts the documents into suitable channels.

> Allow filtered _changes to time out, returning last_seq processed
> -----------------------------------------------------------------
>                 Key: COUCHDB-1682
>                 URL:
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Nathan Vander Wilt
> Right now a filtered _changes query ?since=0 on a database with a high update_seq can
take a very long time to return. If this request is performed through a proxy or through a
browser with a timeout, it may never complete as far as the client is concerned.
> Right now CouchDB itself ignores any polling timeout for such a request — i.e. it
does not time out while the _changes results are still processing. This is okay, as it at
least lets patient clients get a result.
> I propose, though, that the timeout value be respected during the "initial" (e.g. in
the context of a fresh replication) request. When the timeout is reached, the client should
get back a valid response, with incomplete (even empty!) results and a last_seq corresponding
to how far it had processed changes in the background. Then the client/replicator could record
a checkpoint and request processing of the next batch.
> The net result would be that the initial replication request would not be unbounded in
time. Even if a response is "timed out" by a proxy/browser within 30 seconds or 5 minutes,
assuming the client is aware of this limit they could set a bit lower timeout and get back
a last_seq that keeps them from having to (futile-ly) try again from since=0.
> Unfortunately, this does slightly change the semantics of the query: it is as if limit=0
when the client provided no (or a different) limit and may be expecting last_seq to ± match
current_seq for such a request. So perhaps this behaviour would need to be enabled by its
own query parameter, ?batch=please or something.

This message was sent by Atlassian JIRA

View raw message