couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Vatamaniuc <vatam...@gmail.com>
Subject Re: [DISCUSS] Streaming API in CouchDB 4.0
Date Fri, 10 Apr 2020 18:21:30 GMT
Hi Rich,

Thanks for contributing!

Regarding the 412, you're right, I mainly went by the 412 being
returned if a database exists and we try to create one [1]. The
precondition there is in the start line of the http request, just like
`...?limit=...` would be. However a 400 would work better in general
so I agree.

In principle I do like the bookmarks better but it also seems like a
large change to the API so am on the fence there.  Also interestingly
it would almost be easier to change _list_dbs, _dbs_info and other
endpoints which return plain arrays to return {"total": ..., "items":
[...], ...} objects since breakage would be really obvious, in other
words old client will fail right away as opposed returning misleading
data.

Cheers,
-Nick

[1] https://docs.couchdb.org/en/stable/api/database/common.html#put--db



On Thu, Apr 9, 2020 at 12:10 PM Richard Ellis <RICELLIS@uk.ibm.com> wrote:
>
> HI Nick,
>
> I think that if client side code is expected to make multiple requests
> then those requests should be made as easy as possible. So whilst a client
> library can implement complicated pagination recipes (like the current
> Couch view one) - it is much simpler to collect and send a single
> bookmark/token. Especially so if the naming and structural position of the
> bookmark in requests and responses is consistent across all endpoints
> supporting pagination such that the client side code for pagination is
> easily reusable. I'm in favour of anything supporting pagination to
> provide a bookmark/token based system.
>
> Also if there are limits applied to limits then I'd expect that anything
> out of that accepted range would be a 400 bad request - IIUC 412
> Precondition failed has specific meaning relating to matching headers
> https://tools.ietf.org/html/rfc2616#section-10.4.13 which I don't think
> apply in this case.
>
> Rich
>
>
>
> From:   Nick Vatamaniuc <vatamane@gmail.com>
> To:     dev@couchdb.apache.org
> Date:   09/04/2020 00:25
> Subject:        [EXTERNAL] Re: [DISCUSS] Streaming API in CouchDB 4.0
>
>
>
> Thanks for replying, Adam!
>
> Thinking about it some more, it seems there are two benefits
> to changing the streaming APIs:
>
>  1) To provide users with a serializable snapshot. We don't currently
>  have that, as Mike pointed out, unless we use n=1&q=1 or CouchDB
>  version 1.x. It would be nice to get that with a new release.
>
>  2) To avoid the general anti-pattern of streaming all the data in one
>  single request or using a very large skip or limit values.
>
> However, I think the two improvements are not necessarily tied to each
> other. For example, we could set configurable mandatory max limits
> (option E) for all the streaming endpoints in 3.x as well. On the
> other hand, even with a single transaction we could stream say 150k
> rows in 5 seconds. If at some future point FDB would allow minute long
> transactions, we could stream millions of rows before timing out and
> it would still not be a desirable pattern of usage. This is also
> basically option F in a read-only case (we can emit a snapshot as long
> as there are no writes to that particular db), and I think we agree
> that it is not that appealing of an option.
>
> What do we think then about having per-endpoint configurable max
> limits (option E). The configuration could look something like:
>
> [request_limits]
> all_docs = 5000
> views    = 2500
> list_dbs = 1000
> dbs_info = 500
>
> If those limits are set, and a request is made against an endpoint
> without the limit parameter, or a limit or skip is provided but is
> greater than the maximum, it would return back immediately with an
> error (412) and an indication of what the a max limit value is.
>
> And I agree that client libraries are important in helping here.
> So for example, Cloudant client libraries could detect that error and
> either return it to the user, or, as a compatibility mode, use a
> few consecutive requests behind the scenes to stream all the data back
> to the user as requested without the users application code needing
> any updates at all.
>
> If those limits are not set, the API would behave as it does now. This
> would provide a smoother upgrade path from 3.x to 4.x so users
> wouldn't have to rewrite their applications.
>
> I still think the bookmarking approach is interesting, but I think it
> might work better for a new API endpoint or enabled with an explicit
> parameter. I can see a case where users might be using Python requests
> library to fetch _all_docs, assuming it gets all the data, then after
> the upgrade same API endpoint suddenly returns only a fraction of the
> rows. There might be another "bookmark" field there but it is buried
> under a few application layers and gets ignored. They users just notice
> the missing data at some point, and it could be perceived as data loss
>  in a sense.
>
> > how much value do we derive from that streaming behavior if we
>   aggressively limit the `limit`?
>
> Oh good point! That makes sense. We might be able to simplify
> quite a bit of logic internally if we didn't actually stream the data.
> We buffer thousands of doc updates for _bulk_docs already so perhaps
>  it is not that different doing it when reading data in these APIs as
> well.
> It is something that we'd have to experiment with and see how it would
> behaves.
>
> -Nick
>
> On Wed, Apr 1, 2020 at 9:07 PM Adam Kocoloski <kocolosk@apache.org> wrote:
> >
> > This is a really important topic; thanks Nick for bringing it up. Sorry
> I didn’t comment earlier. I think Mike neatly captures my perspective with
> this bit:
> >
> > >> Our current behaviour seems extremely subtle and, I'd argue,
> unexpected. It is hard to reason about if you really need a particular
> guarantee.
> > >>
> > >> Is there an opportunity to clarify behaviour here, such that we
> really _do_ guarantee point-in-time within _any_ single request, but only
> do this by leveraging FoundationDB's transaction isolation semantics and
> as such are only able to offer this based on the 5s timeout in place? The
> request boundary offers a very clear cut, user-visible boundary. This
> would obviously need to cover reads/writes of single docs and so on as
> well as probably needing further work w.r.t. bulk docs etc.
> > >>
> > >> This restriction may naturally loosen as FoundationDB improves and
> the 5s timeout may be increased.
> >
> > It’d be great if we could agree on this use of serializable snapshot
> isolation under the hood for each response to a CouchDB API request
> (excepting _changes) as an optimal state.
> >
> > Of course, we have this complicating factor of an existing API and a
> community of users running applications in production against that API :)
> As you can imagine from the above, I’d be opposed to A); I think that
> squanders a real opportunity that we have here with a new major version. I
> also think that the return on investment for F) is too low; a large
> portion of our production databases see a 24/7 write load so a code path
> that only activates when a DB is quiesced doesn’t get my vote.
> >
> > When I look at the other options, I think it’s important to take a
> broader view and consider the user experience in the client libraries as
> well as the API. Our experience at IBM Cloud is that a large majority of
> API requests come from a well-defined set of client libraries, and as we
> consider non-trivial changes to the API we can look to those libraries as
> a way to smooth over the API breakage, and intelligently surface new
> capabilities even if the least-disruptive way to introduce them to the API
> is a bit janky.
> >
> > As a concrete example, I would support an aggressive ceiling on `limit`
> and `skip` in the 4.0 API, while enhancing popular client libraries as
> needed to allow users to opt-in to automatic pagination through larger
> result sets.
> >
> > Nick rightly points out that we don’t have a good way to declare a read
> version timeout when we’ve already streamed a portion of the result set to
> the client, which is something we ought consider even if we do apply the
> restrictions in E). I acknowledge that I may be opening a can of worms,
> but ... how much value do we derive from that streaming behavior if we
> aggressively limit the `limit`? We wouldn’t be holding that much data in
> memory on the CouchDB side, and I don’t think many of our clients are
> parsing half-completed JSON objects for anything beyond the _changes feed.
> Something to think about.
> >
> > Cheers, Adam
> >
> > > On Feb 25, 2020, at 2:52 PM, Nick Vatamaniuc <vatamane@gmail.com>
> wrote:
> > >
> > > Hi Mike,
> > >
> > > Good point about CouchDB not actually  providing point-in-time
> > > snapshots. I missed those cases when thinking about it.
> > >
> > > I wonder if that points to defaulting to option A since it maintains
> > > the API compatibility and doesn't loosen the current constraints
> > > anyway. At least it will un-break the current version of the branch
> > > until we figure out something better. Otherwise  it's completely
> > > unusable for dbs with more than 200-300k documents.
> > >
> > > I like the idea of returning a bookmark and a completed/not-completed
> > > flag. That is, it would be option D for _all_docs and map-reduce
> > > views, but instead of the complex continuation object it would be a
> > > base64-encoded, opaque object. Passing a bookmark back in as a
> > > parameter would be exclusive to passing in a  start, end, skip, limit,
> > > and direction (forward/reverse) parameters. For _all_dbs, and
> > > _dbs_info where we don't have a place for metadata rows, we might need
> > > a new API endpoint. And maybe that opens the door to expose more
> > > transactional features in the API in general...
> > >
> > > Also, it seems B, C and F have too many corner cases and
> > > inconsistencies so they can probably be discarded, unless someone
> > > disagrees.
> > >
> > > Configurable skips and limit maximums (E) may still be interesting.
> > > Though, they don't necessarily have to be related to transactions, but
> > > can instead be used to ensure streaming APIs are consumed in smaller
> > > chunks.
> > >
> > > Cheers,
> > > -Nick
> > >
> > >
> > >
> > > On Mon, Feb 24, 2020 at 7:26 AM Mike Rhodes <couchdb@dx13.co.uk>
> wrote:
> > >>
> > >> Nick,
> > >>
> > >> Thanks for thinking this through, it's certainly subtle and very
> unclear what is a "good" solution :(
> > >>
> > >> I have a couple of thoughts, firstly about the guarantees we
> currently offer and then wondering whether there is an opportunity to
> improve our API by offering a single guarantee across all request types
> rather than bifurcating guarantees.
> > >>
> > >> ---
> > >>
> > >> The first point is that, by my reasoning, CouchDB 2.x doesn't
> actually don't offer a point-in-time guarantee of the following sort
> currently. I read this as your saying Couch does offer this guarantee,
> apologies if I'm misreading:
> > >>
> > >>> Document the API behavior change that it may
> > >>> present a view of the data is never a point-in-time[4] snapshot of
> the
> > >>> DB.
> > >> ...
> > >>> [4] For example they have a constraint that documents "a" and "z"
> > >>> cannot both be in the database at the same time. But when iterating
> > >>> it's possible that "a" was there at the start. Then by the end, "a"
> > >>> was removed and "z" added, so both "a" and "z" would appear in the
> > >>> emitted stream. Note that FoundationDB has APIs which exhibit the
> same
> > >>> "relaxed" constrains:
> > >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__apple.github.io_foundationdb_api-2Dpython.html-23module-2Dfdb.locality&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=CDCq0vbFWjQXx1sCFm2-iYoMZQ4i0QQj2XmPZmLvZp0&m=2FAZh1XhUm-BtJzY8Lca-ZC-76K6zLu2Q7zpFojCGk8&s=HzP2Pj3x5vl5LP656xQI81QM2YyZZuPN1HYbw_L7jzE&e=
>
> > >>
> > >> I don't believe we offer this guarantee because different database
> shards will respond to the scatter-gather inherent to a single global
> query type request at different times. This means that, given the
> following sequence of events:
> > >>
> > >> (1) The shard containing "a" may start returning at time N.
> > >> (2) "a" may be deleted at N+1, but (1) will still be streaming from
> time N.
> > >> (3) "z" may be written to a second shard at time N+2.
> > >> (4) that second shard may not start returning until time N+3.
> > >>
> > >> By my reasoning, "a" and "z" could thus appear in the same result set
> in current CouchDB, even if they never actually appear in the primary data
> at the same time (regardless of latency of shard replicas coming into
> agreement), voiding [4].
> > >>
> > >> By my reckoning, you have point-in-time across a query request when
> you are working with a single shard, meaning we do have point in time for
> two scenarios:
> > >>
> > >> - Partitioned queries.
> > >> - Q=1 databases.
> > >>
> > >> Albeit this guarantee is still talking about the point in time of a
> single shard's replica rather than all replicas, meaning that further
> requests may produce different results if the shards are not in agreement.
> Which can perhaps be fixed by using stable=true.
> > >>
> > >> I _think_ the working here is correct, but I'd welcome corrections in
> my understanding!
> > >>
> > >> ---
> > >>
> > >> Our current behaviour seems extremely subtle and, I'd argue,
> unexpected. It is hard to reason about if you really need a particular
> guarantee.
> > >>
> > >> Is there an opportunity to clarify behaviour here, such that we
> really _do_ guarantee point-in-time within _any_ single request, but only
> do this by leveraging FoundationDB's transaction isolation semantics and
> as such are only able to offer this based on the 5s timeout in place? The
> request boundary offers a very clear cut, user-visible boundary. This
> would obviously need to cover reads/writes of single docs and so on as
> well as probably needing further work w.r.t. bulk docs etc.
> > >>
> > >> This restriction may naturally loosen as FoundationDB improves and
> the 5s timeout may be increased.
> > >>
> > >> In this approach, my preference would be to add a closing line to the
> result stream to contain both a bookmark (based on the FoundationDB key
> perhaps rather than the index key of itself to avoid problems with
> skip/limit?) and a complete/not-complete boolean to enable clients to
> avoid the extra HTTP round-trip for completed result sets that Nick
> mentions.
> > >>
> > >> ---
> > >>
> > >> For option (F), I feel that the "it sometimes works and sometimes
> doesn't" effect of checking the update-seq to see if we can continue
> streaming will be a confusing experience. I also find something similar
> with option (A) where a single request covers potentially many points in
> time and so feels hard to reason about, although it's a bit less subtle
> than today.
> > >>
> > >> Footnote [2] seems quite a major problem, however, with the single
> transaction approach and as Nick says, it is hard to pick "good" maximums
> for skip -- perhaps users need to just avoid use of these in the new
> system given its behaviour? It feels like there's a definite "against the
> grain" aspect to these.
> > >>
> > >> --
> > >> Mike.
> > >>
> > >> On Wed, 19 Feb 2020, at 22:39, Nick Vatamaniuc wrote:
> > >>> Hello everyone,
> > >>>
> > >>> I'd like to discuss the shape and behavior of streaming APIs for
> CouchDB 4.x
> > >>>
> > >>> By "streaming APIs" I mean APIs which stream data in row as it gets
> > >>> read from the database. These are the endpoints I was thinking of:
> > >>>
> > >>> _all_docs, _all_dbs, _dbs_info  and query results
> > >>>
> > >>> I want to focus on what happens when FoundationDB transactions
> > >>> time-out after 5 seconds. Currently, all those APIs except
> _changes[1]
> > >>> feeds, will crash or freeze. The reason is because the
> > >>> transaction_too_old error at the end of 5 seconds is retry-able by
> > >>> default, so the request handlers run again and end up shoving the
> > >>> whole request down the socket again, headers and all, which is
> > >>> obviously broken and not what we want.
> > >>>
> > >>> There are few alternatives discussed in couchdb-dev channel. I'll
> > >>> present some behaviors but feel free to add more. Some ideas might
> > >>> have been discounted on the IRC discussion already but I'll present
> > >>> them anyway in case is sparks further conversation:
> > >>>
> > >>> A) Do what _changes[1] feeds do. Start a new transaction and
> continue
> > >>> streaming the data from the next key after last emitted in the
> > >>> previous transaction. Document the API behavior change that it may
> > >>> present a view of the data is never a point-in-time[4] snapshot of
> the
> > >>> DB.
> > >>>
> > >>> - Keeps the API shape the same as CouchDB <4.0. Client libraries
> > >>> don't have to change to continue using these CouchDB 4.0 endpoints
> > >>> - This is the easiest to implement since it would re-use the
> > >>> implementation for _changes feed (an extra option passed to the fold
> > >>> function).
> > >>> - Breaks API behavior if users relied on having a point-in-time[4]
> > >>> snapshot view of the data.
> > >>>
> > >>> B) Simply end the stream. Let the users pass a `?transaction=true`
> > >>> param which indicates they are aware the stream may end early and so
> > >>> would have to paginate from the last emitted key with a skip=1. This
> > >>> will keep the request bodies the same as current CouchDB. However,
> if
> > >>> the users got all the data one request, they will end up wasting
> > >>> another request to see if there is more data available. If they
> didn't
> > >>> get any data they might have a too large of a skip value (see [2])
> so
> > >>> would have to guess different values for start/end keys. Or impose
> max
> > >>> limit for the `skip` parameter.
> > >>>
> > >>> C) End the stream and add a final metadata row like a "transaction":
> > >>> "timeout" at the end. That will let the user know to keep paginating
> > >>> from the last key onward. This won't work for `_all_dbs` and
> > >>> `_dbs_info`[3] Maybe let those two endpoints behave like _changes
> > >>> feeds and only use this for views and and _all_docs? If we like this
> > >>> choice, let's think what happens for those as I couldn't come up
> with
> > >>> anything decent there.
> > >>>
> > >>> D) Same as C but to solve the issue with skips[2], emit a bookmark
> > >>> "key" of where the iteration stopped and the current "skip" and
> > >>> "limit" params, which would keep decreasing. Then user would pass
> > >>> those in "start_key=..." in the next request along with the limit
> and
> > >>> skip params. So something like "continuation":{"skip":599,
> "limit":5,
> > >>> "key":"..."}. This has the same issue with array results for
> > >>> `_all_dbs` and `_dbs_info`[3].
> > >>>
> > >>> E) Enforce low `limit` and `skip` parameters. Enforce maximum values
> > >>> there such that response time is likely to fit in one transaction.
> > >>> This could be tricky as different runtime environments will have
> > >>> different characteristics. Also, if the timeout happens there isn't
> a
> > >>> a nice way to send an HTTP error since we already sent the 200
> > >>> response. The downside is that this might break how some users use
> the
> > >>> API, if say the are using large skips and limits already. Perhaps
> here
> > >>> we do both B and D, such that if users want transactional behavior,
> > >>> they specify that `transaction=true` param and only then we enforce
> > >>> low limit and skip maximums.
> > >>>
> > >>> F) At least for `_all_docs` it seems providing a point-in-time
> > >>> snapshot view doesn't necessarily need to be tied to transaction
> > >>> boundaries. We could check the update sequence of the database at
> the
> > >>> start of the next transaction and if it hasn't changed we can
> continue
> > >>> emitting a consistent view. This can apply to C and D and would just
> > >>> determine when the stream ends. If there are no writes happening to
> > >>> the db, this could potential streams all the data just like option
A
> > >>> would do. Not entirely sure if this would work for views.
> > >>>
> > >>> So what do we think? I can see different combinations of options
> here,
> > >>> maybe even different for each API point. For example `_all_dbs`,
> > >>> `_dbs_info` are always A, and `_all_docs` and views default to A but
> > >>> have parameters to do F, etc.
> > >>>
> > >>> Cheers,
> > >>> -Nick
> > >>>
> > >>> Some footnotes:
> > >>>
> > >>> [1] _changes feeds is the only one that works currently. It behaves
> as
> > >>> per RFC
> > >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_couchdb-2Ddocumentation_blob_master_rfcs_003-2Dfdb-2Dseq-2Dindex.md-23access-2Dpatterns&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=CDCq0vbFWjQXx1sCFm2-iYoMZQ4i0QQj2XmPZmLvZp0&m=2FAZh1XhUm-BtJzY8Lca-ZC-76K6zLu2Q7zpFojCGk8&s=7Khj1Mm0BvTQkebpPs4O7kE2moJMmmGjV7_icfRo_q8&e=
> .
> > >>> That is, we continue streaming the data by resetting the transaction
> > >>> object and restarting from the last emitted key (db sequence in this
> > >>> case). However, because the transaction restarts if a document is
> > >>> updated while the streaming take place, it may appear in the
> _changes
> > >>> feed twice. That's a behavior difference from CouchDB < 4.0 and
we'd
> > >>> have to document it, since previously we presented this
> point-in-time
> > >>> snapshot of the database from when we started streaming.
> > >>>
> > >>> [2] Our streaming APIs have both skips and limits. Since FDB doesn't
> > >>> currently support efficient offsets for key selectors
> > >>> (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__apple.github.io_foundationdb_known-2Dlimitations.html-23dont-2Duse-2Dkey-2Dselectors-2Dfor-2Dpaging&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=CDCq0vbFWjQXx1sCFm2-iYoMZQ4i0QQj2XmPZmLvZp0&m=2FAZh1XhUm-BtJzY8Lca-ZC-76K6zLu2Q7zpFojCGk8&s=iJJQDYHRGZ6PQ_FF3sy9nBSygRYkEh3cRPMFYg2Tkq8&e=
> )
> > >>> we implemented skip by iterating over the data. This means that a
> skip
> > >>> of say 100000 could keep timing out the transaction without yielding
> > >>> any data.
> > >>>
> > >>> [3] _all_dbs and _dbs_info return a JSON array so they don't have an
> > >>> obvious place to insert a last metadata row.
> > >>>
> > >>> [4] For example they have a constraint that documents "a" and "z"
> > >>> cannot both be in the database at the same time. But when iterating
> > >>> it's possible that "a" was there at the start. Then by the end, "a"
> > >>> was removed and "z" added, so both "a" and "z" would appear in the
> > >>> emitted stream. Note that FoundationDB has APIs which exhibit the
> same
> > >>> "relaxed" constrains:
> > >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__apple.github.io_foundationdb_api-2Dpython.html-23module-2Dfdb.locality&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=CDCq0vbFWjQXx1sCFm2-iYoMZQ4i0QQj2XmPZmLvZp0&m=2FAZh1XhUm-BtJzY8Lca-ZC-76K6zLu2Q7zpFojCGk8&s=HzP2Pj3x5vl5LP656xQI81QM2YyZZuPN1HYbw_L7jzE&e=
>
> > >>>
> >
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>

Mime
View raw message