couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Ellis <RICEL...@uk.ibm.com>
Subject RE: [DISCUSS] Streaming API in CouchDB 4.0
Date Thu, 09 Apr 2020 16:10:24 GMT
HI Nick,

I think that if client side code is expected to make multiple requests 
then those requests should be made as easy as possible. So whilst a client 
library can implement complicated pagination recipes (like the current 
Couch view one) - it is much simpler to collect and send a single 
bookmark/token. Especially so if the naming and structural position of the 
bookmark in requests and responses is consistent across all endpoints 
supporting pagination such that the client side code for pagination is 
easily reusable. I'm in favour of anything supporting pagination to 
provide a bookmark/token based system.

Also if there are limits applied to limits then I'd expect that anything 
out of that accepted range would be a 400 bad request - IIUC 412 
Precondition failed has specific meaning relating to matching headers 
https://tools.ietf.org/html/rfc2616#section-10.4.13 which I don't think 
apply in this case.

Rich



From:   Nick Vatamaniuc <vatamane@gmail.com>
To:     dev@couchdb.apache.org
Date:   09/04/2020 00:25
Subject:        [EXTERNAL] Re: [DISCUSS] Streaming API in CouchDB 4.0



Thanks for replying, Adam!

Thinking about it some more, it seems there are two benefits
to changing the streaming APIs:

 1) To provide users with a serializable snapshot. We don't currently
 have that, as Mike pointed out, unless we use n=1&q=1 or CouchDB
 version 1.x. It would be nice to get that with a new release.

 2) To avoid the general anti-pattern of streaming all the data in one
 single request or using a very large skip or limit values.

However, I think the two improvements are not necessarily tied to each
other. For example, we could set configurable mandatory max limits
(option E) for all the streaming endpoints in 3.x as well. On the
other hand, even with a single transaction we could stream say 150k
rows in 5 seconds. If at some future point FDB would allow minute long
transactions, we could stream millions of rows before timing out and
it would still not be a desirable pattern of usage. This is also
basically option F in a read-only case (we can emit a snapshot as long
as there are no writes to that particular db), and I think we agree
that it is not that appealing of an option.

What do we think then about having per-endpoint configurable max
limits (option E). The configuration could look something like:

[request_limits]
all_docs = 5000
views    = 2500
list_dbs = 1000
dbs_info = 500

If those limits are set, and a request is made against an endpoint
without the limit parameter, or a limit or skip is provided but is
greater than the maximum, it would return back immediately with an
error (412) and an indication of what the a max limit value is.

And I agree that client libraries are important in helping here.
So for example, Cloudant client libraries could detect that error and
either return it to the user, or, as a compatibility mode, use a
few consecutive requests behind the scenes to stream all the data back
to the user as requested without the users application code needing
any updates at all.

If those limits are not set, the API would behave as it does now. This
would provide a smoother upgrade path from 3.x to 4.x so users
wouldn't have to rewrite their applications.

I still think the bookmarking approach is interesting, but I think it
might work better for a new API endpoint or enabled with an explicit
parameter. I can see a case where users might be using Python requests
library to fetch _all_docs, assuming it gets all the data, then after
the upgrade same API endpoint suddenly returns only a fraction of the
rows. There might be another "bookmark" field there but it is buried
under a few application layers and gets ignored. They users just notice
the missing data at some point, and it could be perceived as data loss
 in a sense.

> how much value do we derive from that streaming behavior if we
  aggressively limit the `limit`?

Oh good point! That makes sense. We might be able to simplify
quite a bit of logic internally if we didn't actually stream the data.
We buffer thousands of doc updates for _bulk_docs already so perhaps
 it is not that different doing it when reading data in these APIs as 
well.
It is something that we'd have to experiment with and see how it would
behaves.

-Nick

On Wed, Apr 1, 2020 at 9:07 PM Adam Kocoloski <kocolosk@apache.org> wrote:
>
> This is a really important topic; thanks Nick for bringing it up. Sorry 
I didn’t comment earlier. I think Mike neatly captures my perspective with 
this bit:
>
> >> Our current behaviour seems extremely subtle and, I'd argue, 
unexpected. It is hard to reason about if you really need a particular 
guarantee.
> >>
> >> Is there an opportunity to clarify behaviour here, such that we 
really _do_ guarantee point-in-time within _any_ single request, but only 
do this by leveraging FoundationDB's transaction isolation semantics and 
as such are only able to offer this based on the 5s timeout in place? The 
request boundary offers a very clear cut, user-visible boundary. This 
would obviously need to cover reads/writes of single docs and so on as 
well as probably needing further work w.r.t. bulk docs etc.
> >>
> >> This restriction may naturally loosen as FoundationDB improves and 
the 5s timeout may be increased.
>
> It’d be great if we could agree on this use of serializable snapshot 
isolation under the hood for each response to a CouchDB API request 
(excepting _changes) as an optimal state.
>
> Of course, we have this complicating factor of an existing API and a 
community of users running applications in production against that API :) 
As you can imagine from the above, I’d be opposed to A); I think that 
squanders a real opportunity that we have here with a new major version. I 
also think that the return on investment for F) is too low; a large 
portion of our production databases see a 24/7 write load so a code path 
that only activates when a DB is quiesced doesn’t get my vote.
>
> When I look at the other options, I think it’s important to take a 
broader view and consider the user experience in the client libraries as 
well as the API. Our experience at IBM Cloud is that a large majority of 
API requests come from a well-defined set of client libraries, and as we 
consider non-trivial changes to the API we can look to those libraries as 
a way to smooth over the API breakage, and intelligently surface new 
capabilities even if the least-disruptive way to introduce them to the API 
is a bit janky.
>
> As a concrete example, I would support an aggressive ceiling on `limit` 
and `skip` in the 4.0 API, while enhancing popular client libraries as 
needed to allow users to opt-in to automatic pagination through larger 
result sets.
>
> Nick rightly points out that we don’t have a good way to declare a read 
version timeout when we’ve already streamed a portion of the result set to 
the client, which is something we ought consider even if we do apply the 
restrictions in E). I acknowledge that I may be opening a can of worms, 
but ... how much value do we derive from that streaming behavior if we 
aggressively limit the `limit`? We wouldn’t be holding that much data in 
memory on the CouchDB side, and I don’t think many of our clients are 
parsing half-completed JSON objects for anything beyond the _changes feed. 
Something to think about.
>
> Cheers, Adam
>
> > On Feb 25, 2020, at 2:52 PM, Nick Vatamaniuc <vatamane@gmail.com> 
wrote:
> >
> > Hi Mike,
> >
> > Good point about CouchDB not actually  providing point-in-time
> > snapshots. I missed those cases when thinking about it.
> >
> > I wonder if that points to defaulting to option A since it maintains
> > the API compatibility and doesn't loosen the current constraints
> > anyway. At least it will un-break the current version of the branch
> > until we figure out something better. Otherwise  it's completely
> > unusable for dbs with more than 200-300k documents.
> >
> > I like the idea of returning a bookmark and a completed/not-completed
> > flag. That is, it would be option D for _all_docs and map-reduce
> > views, but instead of the complex continuation object it would be a
> > base64-encoded, opaque object. Passing a bookmark back in as a
> > parameter would be exclusive to passing in a  start, end, skip, limit,
> > and direction (forward/reverse) parameters. For _all_dbs, and
> > _dbs_info where we don't have a place for metadata rows, we might need
> > a new API endpoint. And maybe that opens the door to expose more
> > transactional features in the API in general...
> >
> > Also, it seems B, C and F have too many corner cases and
> > inconsistencies so they can probably be discarded, unless someone
> > disagrees.
> >
> > Configurable skips and limit maximums (E) may still be interesting.
> > Though, they don't necessarily have to be related to transactions, but
> > can instead be used to ensure streaming APIs are consumed in smaller
> > chunks.
> >
> > Cheers,
> > -Nick
> >
> >
> >
> > On Mon, Feb 24, 2020 at 7:26 AM Mike Rhodes <couchdb@dx13.co.uk> 
wrote:
> >>
> >> Nick,
> >>
> >> Thanks for thinking this through, it's certainly subtle and very 
unclear what is a "good" solution :(
> >>
> >> I have a couple of thoughts, firstly about the guarantees we 
currently offer and then wondering whether there is an opportunity to 
improve our API by offering a single guarantee across all request types 
rather than bifurcating guarantees.
> >>
> >> ---
> >>
> >> The first point is that, by my reasoning, CouchDB 2.x doesn't 
actually don't offer a point-in-time guarantee of the following sort 
currently. I read this as your saying Couch does offer this guarantee, 
apologies if I'm misreading:
> >>
> >>> Document the API behavior change that it may
> >>> present a view of the data is never a point-in-time[4] snapshot of 
the
> >>> DB.
> >> ...
> >>> [4] For example they have a constraint that documents "a" and "z"
> >>> cannot both be in the database at the same time. But when iterating
> >>> it's possible that "a" was there at the start. Then by the end, "a"
> >>> was removed and "z" added, so both "a" and "z" would appear in the
> >>> emitted stream. Note that FoundationDB has APIs which exhibit the 
same
> >>> "relaxed" constrains:
> >>> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__apple.github.io_foundationdb_api-2Dpython.html-23module-2Dfdb.locality&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=CDCq0vbFWjQXx1sCFm2-iYoMZQ4i0QQj2XmPZmLvZp0&m=2FAZh1XhUm-BtJzY8Lca-ZC-76K6zLu2Q7zpFojCGk8&s=HzP2Pj3x5vl5LP656xQI81QM2YyZZuPN1HYbw_L7jzE&e=


> >>
> >> I don't believe we offer this guarantee because different database 
shards will respond to the scatter-gather inherent to a single global 
query type request at different times. This means that, given the 
following sequence of events:
> >>
> >> (1) The shard containing "a" may start returning at time N.
> >> (2) "a" may be deleted at N+1, but (1) will still be streaming from 
time N.
> >> (3) "z" may be written to a second shard at time N+2.
> >> (4) that second shard may not start returning until time N+3.
> >>
> >> By my reasoning, "a" and "z" could thus appear in the same result set 
in current CouchDB, even if they never actually appear in the primary data 
at the same time (regardless of latency of shard replicas coming into 
agreement), voiding [4].
> >>
> >> By my reckoning, you have point-in-time across a query request when 
you are working with a single shard, meaning we do have point in time for 
two scenarios:
> >>
> >> - Partitioned queries.
> >> - Q=1 databases.
> >>
> >> Albeit this guarantee is still talking about the point in time of a 
single shard's replica rather than all replicas, meaning that further 
requests may produce different results if the shards are not in agreement. 
Which can perhaps be fixed by using stable=true.
> >>
> >> I _think_ the working here is correct, but I'd welcome corrections in 
my understanding!
> >>
> >> ---
> >>
> >> Our current behaviour seems extremely subtle and, I'd argue, 
unexpected. It is hard to reason about if you really need a particular 
guarantee.
> >>
> >> Is there an opportunity to clarify behaviour here, such that we 
really _do_ guarantee point-in-time within _any_ single request, but only 
do this by leveraging FoundationDB's transaction isolation semantics and 
as such are only able to offer this based on the 5s timeout in place? The 
request boundary offers a very clear cut, user-visible boundary. This 
would obviously need to cover reads/writes of single docs and so on as 
well as probably needing further work w.r.t. bulk docs etc.
> >>
> >> This restriction may naturally loosen as FoundationDB improves and 
the 5s timeout may be increased.
> >>
> >> In this approach, my preference would be to add a closing line to the 
result stream to contain both a bookmark (based on the FoundationDB key 
perhaps rather than the index key of itself to avoid problems with 
skip/limit?) and a complete/not-complete boolean to enable clients to 
avoid the extra HTTP round-trip for completed result sets that Nick 
mentions.
> >>
> >> ---
> >>
> >> For option (F), I feel that the "it sometimes works and sometimes 
doesn't" effect of checking the update-seq to see if we can continue 
streaming will be a confusing experience. I also find something similar 
with option (A) where a single request covers potentially many points in 
time and so feels hard to reason about, although it's a bit less subtle 
than today.
> >>
> >> Footnote [2] seems quite a major problem, however, with the single 
transaction approach and as Nick says, it is hard to pick "good" maximums 
for skip -- perhaps users need to just avoid use of these in the new 
system given its behaviour? It feels like there's a definite "against the 
grain" aspect to these.
> >>
> >> --
> >> Mike.
> >>
> >> On Wed, 19 Feb 2020, at 22:39, Nick Vatamaniuc wrote:
> >>> Hello everyone,
> >>>
> >>> I'd like to discuss the shape and behavior of streaming APIs for 
CouchDB 4.x
> >>>
> >>> By "streaming APIs" I mean APIs which stream data in row as it gets
> >>> read from the database. These are the endpoints I was thinking of:
> >>>
> >>> _all_docs, _all_dbs, _dbs_info  and query results
> >>>
> >>> I want to focus on what happens when FoundationDB transactions
> >>> time-out after 5 seconds. Currently, all those APIs except 
_changes[1]
> >>> feeds, will crash or freeze. The reason is because the
> >>> transaction_too_old error at the end of 5 seconds is retry-able by
> >>> default, so the request handlers run again and end up shoving the
> >>> whole request down the socket again, headers and all, which is
> >>> obviously broken and not what we want.
> >>>
> >>> There are few alternatives discussed in couchdb-dev channel. I'll
> >>> present some behaviors but feel free to add more. Some ideas might
> >>> have been discounted on the IRC discussion already but I'll present
> >>> them anyway in case is sparks further conversation:
> >>>
> >>> A) Do what _changes[1] feeds do. Start a new transaction and 
continue
> >>> streaming the data from the next key after last emitted in the
> >>> previous transaction. Document the API behavior change that it may
> >>> present a view of the data is never a point-in-time[4] snapshot of 
the
> >>> DB.
> >>>
> >>> - Keeps the API shape the same as CouchDB <4.0. Client libraries
> >>> don't have to change to continue using these CouchDB 4.0 endpoints
> >>> - This is the easiest to implement since it would re-use the
> >>> implementation for _changes feed (an extra option passed to the fold
> >>> function).
> >>> - Breaks API behavior if users relied on having a point-in-time[4]
> >>> snapshot view of the data.
> >>>
> >>> B) Simply end the stream. Let the users pass a `?transaction=true`
> >>> param which indicates they are aware the stream may end early and so
> >>> would have to paginate from the last emitted key with a skip=1. This
> >>> will keep the request bodies the same as current CouchDB. However, 
if
> >>> the users got all the data one request, they will end up wasting
> >>> another request to see if there is more data available. If they 
didn't
> >>> get any data they might have a too large of a skip value (see [2]) 
so
> >>> would have to guess different values for start/end keys. Or impose 
max
> >>> limit for the `skip` parameter.
> >>>
> >>> C) End the stream and add a final metadata row like a "transaction":
> >>> "timeout" at the end. That will let the user know to keep paginating
> >>> from the last key onward. This won't work for `_all_dbs` and
> >>> `_dbs_info`[3] Maybe let those two endpoints behave like _changes
> >>> feeds and only use this for views and and _all_docs? If we like this
> >>> choice, let's think what happens for those as I couldn't come up 
with
> >>> anything decent there.
> >>>
> >>> D) Same as C but to solve the issue with skips[2], emit a bookmark
> >>> "key" of where the iteration stopped and the current "skip" and
> >>> "limit" params, which would keep decreasing. Then user would pass
> >>> those in "start_key=..." in the next request along with the limit 
and
> >>> skip params. So something like "continuation":{"skip":599, 
"limit":5,
> >>> "key":"..."}. This has the same issue with array results for
> >>> `_all_dbs` and `_dbs_info`[3].
> >>>
> >>> E) Enforce low `limit` and `skip` parameters. Enforce maximum values
> >>> there such that response time is likely to fit in one transaction.
> >>> This could be tricky as different runtime environments will have
> >>> different characteristics. Also, if the timeout happens there isn't 
a
> >>> a nice way to send an HTTP error since we already sent the 200
> >>> response. The downside is that this might break how some users use 
the
> >>> API, if say the are using large skips and limits already. Perhaps 
here
> >>> we do both B and D, such that if users want transactional behavior,
> >>> they specify that `transaction=true` param and only then we enforce
> >>> low limit and skip maximums.
> >>>
> >>> F) At least for `_all_docs` it seems providing a point-in-time
> >>> snapshot view doesn't necessarily need to be tied to transaction
> >>> boundaries. We could check the update sequence of the database at 
the
> >>> start of the next transaction and if it hasn't changed we can 
continue
> >>> emitting a consistent view. This can apply to C and D and would just
> >>> determine when the stream ends. If there are no writes happening to
> >>> the db, this could potential streams all the data just like option A
> >>> would do. Not entirely sure if this would work for views.
> >>>
> >>> So what do we think? I can see different combinations of options 
here,
> >>> maybe even different for each API point. For example `_all_dbs`,
> >>> `_dbs_info` are always A, and `_all_docs` and views default to A but
> >>> have parameters to do F, etc.
> >>>
> >>> Cheers,
> >>> -Nick
> >>>
> >>> Some footnotes:
> >>>
> >>> [1] _changes feeds is the only one that works currently. It behaves 
as
> >>> per RFC
> >>> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_couchdb-2Ddocumentation_blob_master_rfcs_003-2Dfdb-2Dseq-2Dindex.md-23access-2Dpatterns&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=CDCq0vbFWjQXx1sCFm2-iYoMZQ4i0QQj2XmPZmLvZp0&m=2FAZh1XhUm-BtJzY8Lca-ZC-76K6zLu2Q7zpFojCGk8&s=7Khj1Mm0BvTQkebpPs4O7kE2moJMmmGjV7_icfRo_q8&e=

.
> >>> That is, we continue streaming the data by resetting the transaction
> >>> object and restarting from the last emitted key (db sequence in this
> >>> case). However, because the transaction restarts if a document is
> >>> updated while the streaming take place, it may appear in the 
_changes
> >>> feed twice. That's a behavior difference from CouchDB < 4.0 and we'd
> >>> have to document it, since previously we presented this 
point-in-time
> >>> snapshot of the database from when we started streaming.
> >>>
> >>> [2] Our streaming APIs have both skips and limits. Since FDB doesn't
> >>> currently support efficient offsets for key selectors
> >>> (
https://urldefense.proofpoint.com/v2/url?u=https-3A__apple.github.io_foundationdb_known-2Dlimitations.html-23dont-2Duse-2Dkey-2Dselectors-2Dfor-2Dpaging&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=CDCq0vbFWjQXx1sCFm2-iYoMZQ4i0QQj2XmPZmLvZp0&m=2FAZh1XhUm-BtJzY8Lca-ZC-76K6zLu2Q7zpFojCGk8&s=iJJQDYHRGZ6PQ_FF3sy9nBSygRYkEh3cRPMFYg2Tkq8&e=

)
> >>> we implemented skip by iterating over the data. This means that a 
skip
> >>> of say 100000 could keep timing out the transaction without yielding
> >>> any data.
> >>>
> >>> [3] _all_dbs and _dbs_info return a JSON array so they don't have an
> >>> obvious place to insert a last metadata row.
> >>>
> >>> [4] For example they have a constraint that documents "a" and "z"
> >>> cannot both be in the database at the same time. But when iterating
> >>> it's possible that "a" was there at the start. Then by the end, "a"
> >>> was removed and "z" added, so both "a" and "z" would appear in the
> >>> emitted stream. Note that FoundationDB has APIs which exhibit the 
same
> >>> "relaxed" constrains:
> >>> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__apple.github.io_foundationdb_api-2Dpython.html-23module-2Dfdb.locality&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=CDCq0vbFWjQXx1sCFm2-iYoMZQ4i0QQj2XmPZmLvZp0&m=2FAZh1XhUm-BtJzY8Lca-ZC-76K6zLu2Q7zpFojCGk8&s=HzP2Pj3x5vl5LP656xQI81QM2YyZZuPN1HYbw_L7jzE&e=


> >>>
>




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Mime
View raw message