From dev-return-49268-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Fri Apr 24 09:45:34 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id E29AC18062C for ; Fri, 24 Apr 2020 11:45:33 +0200 (CEST) Received: (qmail 99349 invoked by uid 500); 24 Apr 2020 09:45:33 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 99337 invoked by uid 99); 24 Apr 2020 09:45:33 -0000 Received: from ui-eu-02.ponee.io (HELO localhost) (116.202.110.96) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Apr 2020 09:45:33 +0000 In-Reply-To: <2A0FA2E0-4542-4078-922A-38B505AD9D1A@apache.org> X-Mailer: LuaSocket 3.0-rc1 To: Message-ID: x-ponymail-sender: 3fd039c72976f9f89629fb5b95d0a929c183add0 x-ponymail-agent: PonyMail Composer/0.2 From: Ilya Khlopotov Content-Type: text/plain; charset=utf-8 Subject: Re: [DISCUSS] Streaming API in CouchDB 4.0 Date: Fri, 24 Apr 2020 09:45:32 -0000 MIME-Version: 1.0 References: <2A0FA2E0-4542-4078-922A-38B505AD9D1A@apache.org> <30f3e543-4cb8-d20d-21d6-74761b3c156f@apache.org> <4BA02B48-F3ED-412B-966C-34D594055FE2@apache.org> > On versioning, I've not seen a better article than this one: https://www.troyhunt.com/your-api-versioning-is-wrong-which-is/ I wouldn't propose new endpoint if we would have a strong story for API versioning. Currently we don't. BTW we could put these new endpoints into a new namespace for example `_v2/_all_docs`. In this case we wouldn't need to invent new names. Best regards, iilyak On 2020/04/23 21:31:41, Robert Samuel Newson wrote: > On versioning, I've not seen a better article than this one: https://www.troyhunt.com/your-api-versioning-is-wrong-which-is/ > > For _changes, definitely agree we should be including it in this discussion, it is the only endpoint with, in theory, an eternal response, and I think that's a bug not a feature these days. CouchDB exists in a wider ecosystem (and often behind a load balancer), it would be good to define an upper bound on how long you can listen before being forced to query again. > > B. > > > On 23 Apr 2020, at 22:15, Paul Davis wrote: > > > > I'd agree that my initial reaction to cursor was that its not a great > > fit, but there does seem to be the existing name used in the greater > > REST world for this sort of pagination so I'm not concerned about > > using that terminology. > > > > I'm generally on board with allowing and setting some default sane > > limits on pages. We probably should have done that quite awhile ago > > after moving to native clustering and now that we have FDB limits I > > think it makes even more sense to have an API that does not lend > > itself to crazy errors when people are just trying to poke at an API. > > > > I think we're all on board that one of the goals is to make sure that > > clients don't accidentally misinterpret a response. That is, we're > > trying to be quite diligent that a user doesn't get 1000 rows and not > > realize there's another 10 that were beyond the limit. The bookmark > > approach with hard caps seems like a generally fine approach to me. > > The current approach users extra URL path segments to try and avoid > > this confusion. I wonder if we should consider starting to properly > > version our API using one of the many schemes that are used. Having > > read through a few articles I don't have a very clear favorite though. > > > > As to this particular proposal I do see a couple issues: > > > > `total` - We can do this in most cases fairly easily. Though it's a > > bit odd for continuous changes. > > > > `complete` - I'm not sure whether this is entirely possible given the > > API that FDB presents us. Specifically, when we set a range and we get > > back exactly $num_rows in the response, if the data set ended at > > exactly that page I don't think the `more` flag from fdb would tell us > > that. So we'd have a clunky UX there where we say not complete but the > > next page is empty. That's also not to mention that depending on > > whether we're looking at snapshots and so on that there's no way for > > us to know between stateless requests whether there were more rows > > added to the end. > > > > `page` - This one is just hard/impossible to calculate. FDB doesn't > > provide us with offsets or even an efficient "about how many rows in > > this range?" type queries so providing that would be both inaccurate > > and fairly difficult/expensive to calculate. In some cases I think we > > could have something maybe close that didn't suck too badly, but it'd > > also fall down for changes as well due to the way that updates reorder > > them. > > > > `update_seq` - I'm just not sure on when this would be useful or what > > it would refer to. Maybe a version stamp of the last change for that > > request? If we had a future API that asked for a snapshot access then > > maybe? But if we did do something there with versionstamps or read > > versions I'd expect that to come with the rest of the API. > > > > For the bookmark fields: > > > > `direction` vs `descending` seems like a field duplication to me. > > > > `page` - This would seem to suggest we could skip to a certain > > location in the results numerically which we are not able to do with > > the FDB API. > > > > `last_key` vs `start_key` seems like a field duplication. We don't > > need to know where things started I don't think. Just where to start > > from and where to end. > > > > `update_seq` - is same as earlier. Not entirely sure on the intent there. > > > > `timestamp` - Expiring bookmarks based on time does not seem like a > > good idea. Both for clock skew and why bother when this would > > functionally just be a convenience API that users could already > > implement for themselves. > > > > Another thing might also be to provide our bookmark as a full link > > that seems to be fairly standard REST practice these days. Something > > that clients don't have to do any logic with so that we're free to > > change the implementation. > > > > And lastly, I don't think we should be neglecting the _changes API as > > part of this discussion. I realize that we'll need to support the > > older streaming semantics if we want to maintain replication > > compatibility (which I think we'll all agree is a Good Thing) but it > > also feels a bit wrong to ignore it as part of this work if we're > > going to be modernizing our APIs. Though if we do pick up a good > > versioning scheme then we could theoretically make those changes > > easily enough. Plus, who doesn't want to rewrite chttpd to be a whole > > lot less... chttpd-y? > > > > > > On Thu, Apr 23, 2020 at 1:43 PM Robert Samuel Newson wrote: > >> > >> > >> I think it's a key difference from "cursor" as I've seen them elsewhere, that ours will point at an ever changing database, you couldn't seamlessly cursor through a large data set, one "page" at a time. > >> > >> Bookmarks began in search (raises guilty hand) in order to address a Lucene-specific issue (that high values of "skip" are incredibly inefficient, using lots of RAM). That is not true for CouchDB's own indexes, which can be navigated perfectly with startkey/endkey/startkey_docid/endkey_docid, etc. > >> > >> I guess I'm not helping much with these observations but I wouldn't like to see CouchDB gain an additional and ugly method of doing something already possible. > >> > >> B. > >> > >>> On 23 Apr 2020, at 19:02, Joan Touzet wrote: > >>> > >>> I realise this is bikeshedding, but I guess that's kind of the point... Everything below is my opinion, not "fact." > >>> > >>> It's unfortunate we need a new endpoint for all of this. In a vacuum I might have just suggested we use the semantics we already have, perhaps with ?from= instead of ?since= . > >>> > >>> "page" only works if the size of a page is well known, either by server preference or directly in the URL. If I ask for: > >>> > >>> GET /{db}/_all_docs?limit=20&page=3 > >>> > >>> I know that I'm always going to get document 41 through 60 in the default collation order. > >>> > >>> There's a *fantastic* summary of examples from popular REST APIs here: > >>> > >>> https://medium.com/@ignaciochiazzo/paginating-requests-in-apis-d4883d4c1c4c > >>> > >>> We are *pretty close* to what a cursor means in those other examples, except for the fact that our cursor can go stale/invalid after a short time. > >>> > >>> Bob, could you be a bit more detailed in your explanation how our definition isn't close to these? Or did you mean SQL CURSOR (which is something entirely different?) If so, I'm fine with this being a REST API cursor - something clearly distinct. > >>> > >>> I come back to wanting to preserve the existing endpoint syntax and naming, without new endpoints, but specifying this new FDB token via ?cursor= and this being the trigger for the new behaviour. At some point, we simply stop accepting ?since= tokens. This seems inline with other popular REST APIs. > >>> > >>> -Joan "still sick and not sleeping right" Touzet > >>> > >>> > >>> On 2020-04-23 12:30, Robert Newson wrote: > >>>> cursor has established meaning in other databases and ours would not be very close to them. I don’t think it’s a good idea. > >>>> B. > >>>>> On 23 Apr 2020, at 11:50, Ilya Khlopotov wrote: > >>>>> > >>>>>  > >>>>>> > >>>>>> The best I could come up with is replacing page with > >>>>>> cursor - {db}/_all_docs/cursor or possibly {db}/_cursor/_all_docs > >>>>> Good idea, I like {db}/_all_docs/cursor (or {db}/_all_docs/_cursor). > >>>>> > >>>>>> On 2020/04/23 08:54:36, Garren Smith wrote: > >>>>>> I agree with Bob that page doesn't make sense as an endpoint. I'm also > >>>>>> rubbish with naming. The best I could come up with is replacing page with > >>>>>> cursor - {db}/_all_docs/cursor or possibly {db}/_cursor/_all_docs > >>>>>> All the fields in the bookmark make sense except timestamp. Why would it > >>>>>> matter if the timestamp is old? What happens if a node's time is an hour > >>>>>> behind another node? > >>>>>> > >>>>>> > >>>>>>> On Thu, Apr 23, 2020 at 4:55 AM Ilya Khlopotov wrote: > >>>>>>> > >>>>>>> - page is to provide some notion of progress for user > >>>>>>> - timestamp - I was thinking that we should drop requests if user would > >>>>>>> try to pass bookmark created an hour ago. > >>>>>>> > >>>>>>> On 2020/04/22 21:58:40, Robert Samuel Newson wrote: > >>>>>>>> "page" and "page number" are odd to me as these don't exist as concepts, > >>>>>>> I'd rather not invent them. I note there's no mention of page size, which > >>>>>>> makes "page number" very vague. > >>>>>>>> > >>>>>>>> What is "timestamp" in the bookmark and what effect does it have when > >>>>>>> the bookmark is passed back in? > >>>>>>>> > >>>>>>>> I guess, why does the bookmark include so much extraneous data? Items > >>>>>>> that are not needed to find the fdb key to begin the next response from. > >>>>>>>> > >>>>>>>> > >>>>>>>>> On 22 Apr 2020, at 21:18, Ilya Khlopotov wrote: > >>>>>>>>> > >>>>>>>>> Hello everyone, > >>>>>>>>> > >>>>>>>>> Based on the discussions on the thread I would like to propose a > >>>>>>> number of first steps: > >>>>>>>>> 1) introduce new endpoints > >>>>>>>>> - {db}/_all_docs/page > >>>>>>>>> - {db}/_all_docs/queries/page > >>>>>>>>> - _all_dbs/page > >>>>>>>>> - _dbs_info/page > >>>>>>>>> - {db}/_design/{ddoc}/_view/{view}/page > >>>>>>>>> - {db}/_design/{ddoc}/_view/{view}/queries/page > >>>>>>>>> - {db}/_find/page > >>>>>>>>> > >>>>>>>>> These new endpoints would act as follows: > >>>>>>>>> - don't use delayed responses > >>>>>>>>> - return object with following structure > >>>>>>>>> ``` > >>>>>>>>> { > >>>>>>>>> "total": Total, > >>>>>>>>> "bookmark": base64 encoded opaque value, > >>>>>>>>> "completed": true | false, > >>>>>>>>> "update_seq": when available, > >>>>>>>>> "page": current page number, > >>>>>>>>> "items": [ > >>>>>>>>> ] > >>>>>>>>> } > >>>>>>>>> ``` > >>>>>>>>> - the bookmark would include following data (base64 or protobuff???): > >>>>>>>>> - direction > >>>>>>>>> - page > >>>>>>>>> - descending > >>>>>>>>> - endkey > >>>>>>>>> - endkey_docid > >>>>>>>>> - inclusive_end > >>>>>>>>> - startkey > >>>>>>>>> - startkey_docid > >>>>>>>>> - last_key > >>>>>>>>> - update_seq > >>>>>>>>> - timestamp > >>>>>>>>> ``` > >>>>>>>>> > >>>>>>>>> 2) Implement per-endpoint configurable max limits > >>>>>>>>> ``` > >>>>>>>>> _all_docs = 5000 > >>>>>>>>> _all_docs/queries = 5000 > >>>>>>>>> _all_dbs = 5000 > >>>>>>>>> _dbs_info = 5000 > >>>>>>>>> _view = 2500 > >>>>>>>>> _view/queries = 2500 > >>>>>>>>> _find = 2500 > >>>>>>>>> ``` > >>>>>>>>> > >>>>>>>>> Latter (after few years) CouchDB would deprecate and remove old > >>>>>>> endpoints. > >>>>>>>>> > >>>>>>>>> Best regards, > >>>>>>>>> iilyak > >>>>>>>>> > >>>>>>>>> On 2020/02/19 22:39:45, Nick Vatamaniuc wrote: > >>>>>>>>>> Hello everyone, > >>>>>>>>>> > >>>>>>>>>> I'd like to discuss the shape and behavior of streaming APIs for > >>>>>>> CouchDB 4.x > >>>>>>>>>> > >>>>>>>>>> By "streaming APIs" I mean APIs which stream data in row as it gets > >>>>>>>>>> read from the database. These are the endpoints I was thinking of: > >>>>>>>>>> > >>>>>>>>>> _all_docs, _all_dbs, _dbs_info and query results > >>>>>>>>>> > >>>>>>>>>> I want to focus on what happens when FoundationDB transactions > >>>>>>>>>> time-out after 5 seconds. Currently, all those APIs except _changes[1] > >>>>>>>>>> feeds, will crash or freeze. The reason is because the > >>>>>>>>>> transaction_too_old error at the end of 5 seconds is retry-able by > >>>>>>>>>> default, so the request handlers run again and end up shoving the > >>>>>>>>>> whole request down the socket again, headers and all, which is > >>>>>>>>>> obviously broken and not what we want. > >>>>>>>>>> > >>>>>>>>>> There are few alternatives discussed in couchdb-dev channel. I'll > >>>>>>>>>> present some behaviors but feel free to add more. Some ideas might > >>>>>>>>>> have been discounted on the IRC discussion already but I'll present > >>>>>>>>>> them anyway in case is sparks further conversation: > >>>>>>>>>> > >>>>>>>>>> A) Do what _changes[1] feeds do. Start a new transaction and continue > >>>>>>>>>> streaming the data from the next key after last emitted in the > >>>>>>>>>> previous transaction. Document the API behavior change that it may > >>>>>>>>>> present a view of the data is never a point-in-time[4] snapshot of the > >>>>>>>>>> DB. > >>>>>>>>>> > >>>>>>>>>> - Keeps the API shape the same as CouchDB <4.0. Client libraries > >>>>>>>>>> don't have to change to continue using these CouchDB 4.0 endpoints > >>>>>>>>>> - This is the easiest to implement since it would re-use the > >>>>>>>>>> implementation for _changes feed (an extra option passed to the fold > >>>>>>>>>> function). > >>>>>>>>>> - Breaks API behavior if users relied on having a point-in-time[4] > >>>>>>>>>> snapshot view of the data. > >>>>>>>>>> > >>>>>>>>>> B) Simply end the stream. Let the users pass a `?transaction=true` > >>>>>>>>>> param which indicates they are aware the stream may end early and so > >>>>>>>>>> would have to paginate from the last emitted key with a skip=1. This > >>>>>>>>>> will keep the request bodies the same as current CouchDB. However, if > >>>>>>>>>> the users got all the data one request, they will end up wasting > >>>>>>>>>> another request to see if there is more data available. If they didn't > >>>>>>>>>> get any data they might have a too large of a skip value (see [2]) so > >>>>>>>>>> would have to guess different values for start/end keys. Or impose max > >>>>>>>>>> limit for the `skip` parameter. > >>>>>>>>>> > >>>>>>>>>> C) End the stream and add a final metadata row like a "transaction": > >>>>>>>>>> "timeout" at the end. That will let the user know to keep paginating > >>>>>>>>>> from the last key onward. This won't work for `_all_dbs` and > >>>>>>>>>> `_dbs_info`[3] Maybe let those two endpoints behave like _changes > >>>>>>>>>> feeds and only use this for views and and _all_docs? If we like this > >>>>>>>>>> choice, let's think what happens for those as I couldn't come up with > >>>>>>>>>> anything decent there. > >>>>>>>>>> > >>>>>>>>>> D) Same as C but to solve the issue with skips[2], emit a bookmark > >>>>>>>>>> "key" of where the iteration stopped and the current "skip" and > >>>>>>>>>> "limit" params, which would keep decreasing. Then user would pass > >>>>>>>>>> those in "start_key=..." in the next request along with the limit and > >>>>>>>>>> skip params. So something like "continuation":{"skip":599, "limit":5, > >>>>>>>>>> "key":"..."}. This has the same issue with array results for > >>>>>>>>>> `_all_dbs` and `_dbs_info`[3]. > >>>>>>>>>> > >>>>>>>>>> E) Enforce low `limit` and `skip` parameters. Enforce maximum values > >>>>>>>>>> there such that response time is likely to fit in one transaction. > >>>>>>>>>> This could be tricky as different runtime environments will have > >>>>>>>>>> different characteristics. Also, if the timeout happens there isn't a > >>>>>>>>>> a nice way to send an HTTP error since we already sent the 200 > >>>>>>>>>> response. The downside is that this might break how some users use the > >>>>>>>>>> API, if say the are using large skips and limits already. Perhaps here > >>>>>>>>>> we do both B and D, such that if users want transactional behavior, > >>>>>>>>>> they specify that `transaction=true` param and only then we enforce > >>>>>>>>>> low limit and skip maximums. > >>>>>>>>>> > >>>>>>>>>> F) At least for `_all_docs` it seems providing a point-in-time > >>>>>>>>>> snapshot view doesn't necessarily need to be tied to transaction > >>>>>>>>>> boundaries. We could check the update sequence of the database at the > >>>>>>>>>> start of the next transaction and if it hasn't changed we can continue > >>>>>>>>>> emitting a consistent view. This can apply to C and D and would just > >>>>>>>>>> determine when the stream ends. If there are no writes happening to > >>>>>>>>>> the db, this could potential streams all the data just like option A > >>>>>>>>>> would do. Not entirely sure if this would work for views. > >>>>>>>>>> > >>>>>>>>>> So what do we think? I can see different combinations of options here, > >>>>>>>>>> maybe even different for each API point. For example `_all_dbs`, > >>>>>>>>>> `_dbs_info` are always A, and `_all_docs` and views default to A but > >>>>>>>>>> have parameters to do F, etc. > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> -Nick > >>>>>>>>>> > >>>>>>>>>> Some footnotes: > >>>>>>>>>> > >>>>>>>>>> [1] _changes feeds is the only one that works currently. It behaves as > >>>>>>>>>> per RFC > >>>>>>> https://github.com/apache/couchdb-documentation/blob/master/rfcs/003-fdb-seq-index.md#access-patterns > >>>>>>> . > >>>>>>>>>> That is, we continue streaming the data by resetting the transaction > >>>>>>>>>> object and restarting from the last emitted key (db sequence in this > >>>>>>>>>> case). However, because the transaction restarts if a document is > >>>>>>>>>> updated while the streaming take place, it may appear in the _changes > >>>>>>>>>> feed twice. That's a behavior difference from CouchDB < 4.0 and we'd > >>>>>>>>>> have to document it, since previously we presented this point-in-time > >>>>>>>>>> snapshot of the database from when we started streaming. > >>>>>>>>>> > >>>>>>>>>> [2] Our streaming APIs have both skips and limits. Since FDB doesn't > >>>>>>>>>> currently support efficient offsets for key selectors > >>>>>>>>>> ( > >>>>>>> https://apple.github.io/foundationdb/known-limitations.html#dont-use-key-selectors-for-paging > >>>>>>> ) > >>>>>>>>>> we implemented skip by iterating over the data. This means that a skip > >>>>>>>>>> of say 100000 could keep timing out the transaction without yielding > >>>>>>>>>> any data. > >>>>>>>>>> > >>>>>>>>>> [3] _all_dbs and _dbs_info return a JSON array so they don't have an > >>>>>>>>>> obvious place to insert a last metadata row. > >>>>>>>>>> > >>>>>>>>>> [4] For example they have a constraint that documents "a" and "z" > >>>>>>>>>> cannot both be in the database at the same time. But when iterating > >>>>>>>>>> it's possible that "a" was there at the start. Then by the end, "a" > >>>>>>>>>> was removed and "z" added, so both "a" and "z" would appear in the > >>>>>>>>>> emitted stream. Note that FoundationDB has APIs which exhibit the same > >>>>>>>>>> "relaxed" constrains: > >>>>>>>>>> > >>>>>>> https://apple.github.io/foundationdb/api-python.html#module-fdb.locality > >>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >> > >