From dev-return-49263-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Thu Apr 23 21:14:45 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id A59C5180608 for ; Thu, 23 Apr 2020 23:14:44 +0200 (CEST) Received: (qmail 23840 invoked by uid 500); 23 Apr 2020 21:14:43 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 23828 invoked by uid 99); 23 Apr 2020 21:14:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Apr 2020 21:14:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 0D199182165 for ; Thu, 23 Apr 2020 21:14:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.301 X-Spam-Level: X-Spam-Status: No, score=0.301 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id JoLXleggw0KA for ; Thu, 23 Apr 2020 21:14:40 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::244; helo=mail-lj1-x244.google.com; envelope-from=paul.joseph.davis@gmail.com; receiver= Received: from mail-lj1-x244.google.com (mail-lj1-x244.google.com [IPv6:2a00:1450:4864:20::244]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 48DD47DD12 for ; Thu, 23 Apr 2020 21:14:40 +0000 (UTC) Received: by mail-lj1-x244.google.com with SMTP id f11so3153666ljp.1 for ; Thu, 23 Apr 2020 14:14:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=Pexc3th+EwMbiQtp5wAwyAZqPCQyIbvtb1coIW1Zxok=; b=YG1J2xPPlQ8QJ0VkIalHRDaf9JXD1R+EkOvAHLkvoEG/RxuNB1iUR3bJwjtdOEN/oP VXh9vUZNJqC2xpGNq9vG064Rb8EtlpiIhQfryp97EAsWwifNRKXHqYieBgzHGjzwRNqW ppPR2HbP2Xacx082ZGuhgKhECaAJ05UEtFWFIT0q8kwfGxFoRVWeNdH3R5Ucs8AXxkNU X6veyKidpqMZsHYoEV5PntTCkVewav1L9LSCVt6HHcju3cZhCnsW7oYEdHKNwYNzcABy npLmR710b4nP4Ui8XIlMJHVqKgRtQgqPFp2PHcsyt+2BdnaSO4+rXEAcnoeoU8XSEbFN EPog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=Pexc3th+EwMbiQtp5wAwyAZqPCQyIbvtb1coIW1Zxok=; b=MfULuQsRRjIEQm+UT7GtFL76Tl1X8CgEefnV5G8Ke3w3+L76LoW3B21MlsrYvDcGa4 sN0ub1kjCvNW0rA2szq4hnFbkZkkxKOOgJhyIkWSoEyX6xjP1J1n4BGV9vnofIagdlns oTRmUsaeKelIrnroU2A0g/eIAph0IC1As6CWocBPTWSXEVDq+gcOhIOjH4Zb3NLON8oR GFiM/RsrP8SaVeBW1+KUJNynx+o9bG5AHFNaqlIbHPDcaE51FoLOGMQ6k7ed6WAtUSrP GILS6P1cRM7WPAew8MVR1LTRoEgI7tGhMYoh92DdfdEp/JDFp+pkAw9KUBJYfq8oMBXl vlyw== X-Gm-Message-State: AGi0PubTpIHLPCk9TphcmUICgmIzE0x2P3DSAZYXw/9T8mz/hsqGaEEp wOlt9TfYc6lHuDM1bBZidgS5iblt95DaivnELbdoxw== X-Google-Smtp-Source: APiQypJzYhgXJ30uvfSXvL+22vqznO7mYcyUja7zhFbTWq0XLPXvd8smsJ6sTOm0c3WIlVrQGoUO2B68gyfLNGGLEHY= X-Received: by 2002:a2e:a40b:: with SMTP id p11mr1866999ljn.148.1587676472684; Thu, 23 Apr 2020 14:14:32 -0700 (PDT) MIME-Version: 1.0 References: <30f3e543-4cb8-d20d-21d6-74761b3c156f@apache.org> <4BA02B48-F3ED-412B-966C-34D594055FE2@apache.org> In-Reply-To: <4BA02B48-F3ED-412B-966C-34D594055FE2@apache.org> From: Paul Davis Date: Thu, 23 Apr 2020 16:15:05 -0500 Message-ID: Subject: Re: [DISCUSS] Streaming API in CouchDB 4.0 To: dev@couchdb.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I'd agree that my initial reaction to cursor was that its not a great fit, but there does seem to be the existing name used in the greater REST world for this sort of pagination so I'm not concerned about using that terminology. I'm generally on board with allowing and setting some default sane limits on pages. We probably should have done that quite awhile ago after moving to native clustering and now that we have FDB limits I think it makes even more sense to have an API that does not lend itself to crazy errors when people are just trying to poke at an API. I think we're all on board that one of the goals is to make sure that clients don't accidentally misinterpret a response. That is, we're trying to be quite diligent that a user doesn't get 1000 rows and not realize there's another 10 that were beyond the limit. The bookmark approach with hard caps seems like a generally fine approach to me. The current approach users extra URL path segments to try and avoid this confusion. I wonder if we should consider starting to properly version our API using one of the many schemes that are used. Having read through a few articles I don't have a very clear favorite though. As to this particular proposal I do see a couple issues: `total` - We can do this in most cases fairly easily. Though it's a bit odd for continuous changes. `complete` - I'm not sure whether this is entirely possible given the API that FDB presents us. Specifically, when we set a range and we get back exactly $num_rows in the response, if the data set ended at exactly that page I don't think the `more` flag from fdb would tell us that. So we'd have a clunky UX there where we say not complete but the next page is empty. That's also not to mention that depending on whether we're looking at snapshots and so on that there's no way for us to know between stateless requests whether there were more rows added to the end. `page` - This one is just hard/impossible to calculate. FDB doesn't provide us with offsets or even an efficient "about how many rows in this range?" type queries so providing that would be both inaccurate and fairly difficult/expensive to calculate. In some cases I think we could have something maybe close that didn't suck too badly, but it'd also fall down for changes as well due to the way that updates reorder them. `update_seq` - I'm just not sure on when this would be useful or what it would refer to. Maybe a version stamp of the last change for that request? If we had a future API that asked for a snapshot access then maybe? But if we did do something there with versionstamps or read versions I'd expect that to come with the rest of the API. For the bookmark fields: `direction` vs `descending` seems like a field duplication to me. `page` - This would seem to suggest we could skip to a certain location in the results numerically which we are not able to do with the FDB API. `last_key` vs `start_key` seems like a field duplication. We don't need to know where things started I don't think. Just where to start from and where to end. `update_seq` - is same as earlier. Not entirely sure on the intent there. `timestamp` - Expiring bookmarks based on time does not seem like a good idea. Both for clock skew and why bother when this would functionally just be a convenience API that users could already implement for themselves. Another thing might also be to provide our bookmark as a full link that seems to be fairly standard REST practice these days. Something that clients don't have to do any logic with so that we're free to change the implementation. And lastly, I don't think we should be neglecting the _changes API as part of this discussion. I realize that we'll need to support the older streaming semantics if we want to maintain replication compatibility (which I think we'll all agree is a Good Thing) but it also feels a bit wrong to ignore it as part of this work if we're going to be modernizing our APIs. Though if we do pick up a good versioning scheme then we could theoretically make those changes easily enough. Plus, who doesn't want to rewrite chttpd to be a whole lot less... chttpd-y? On Thu, Apr 23, 2020 at 1:43 PM Robert Samuel Newson w= rote: > > > I think it's a key difference from "cursor" as I've seen them elsewhere, = that ours will point at an ever changing database, you couldn't seamlessly = cursor through a large data set, one "page" at a time. > > Bookmarks began in search (raises guilty hand) in order to address a Luce= ne-specific issue (that high values of "skip" are incredibly inefficient, u= sing lots of RAM). That is not true for CouchDB's own indexes, which can be= navigated perfectly with startkey/endkey/startkey_docid/endkey_docid, etc. > > I guess I'm not helping much with these observations but I wouldn't like = to see CouchDB gain an additional and ugly method of doing something alread= y possible. > > B. > > > On 23 Apr 2020, at 19:02, Joan Touzet wrote: > > > > I realise this is bikeshedding, but I guess that's kind of the point...= Everything below is my opinion, not "fact." > > > > It's unfortunate we need a new endpoint for all of this. In a vacuum I = might have just suggested we use the semantics we already have, perhaps wit= h ?from=3D instead of ?since=3D . > > > > "page" only works if the size of a page is well known, either by server= preference or directly in the URL. If I ask for: > > > > GET /{db}/_all_docs?limit=3D20&page=3D3 > > > > I know that I'm always going to get document 41 through 60 in the defau= lt collation order. > > > > There's a *fantastic* summary of examples from popular REST APIs here: > > > > https://medium.com/@ignaciochiazzo/paginating-requests-in-apis-d4883d4c= 1c4c > > > > We are *pretty close* to what a cursor means in those other examples, e= xcept for the fact that our cursor can go stale/invalid after a short time. > > > > Bob, could you be a bit more detailed in your explanation how our defin= ition isn't close to these? Or did you mean SQL CURSOR (which is something = entirely different?) If so, I'm fine with this being a REST API cursor - so= mething clearly distinct. > > > > I come back to wanting to preserve the existing endpoint syntax and nam= ing, without new endpoints, but specifying this new FDB token via ?cursor= =3D and this being the trigger for the new behaviour. At some point, we sim= ply stop accepting ?since=3D tokens. This seems inline with other popular R= EST APIs. > > > > -Joan "still sick and not sleeping right" Touzet > > > > > > On 2020-04-23 12:30, Robert Newson wrote: > >> cursor has established meaning in other databases and ours would not b= e very close to them. I don=E2=80=99t think it=E2=80=99s a good idea. > >> B. > >>> On 23 Apr 2020, at 11:50, Ilya Khlopotov wrote: > >>> > >>> =EF=BB=BF > >>>> > >>>> The best I could come up with is replacing page with > >>>> cursor - {db}/_all_docs/cursor or possibly {db}/_cursor/_all_docs > >>> Good idea, I like {db}/_all_docs/cursor (or {db}/_all_docs/_cursor). > >>> > >>>> On 2020/04/23 08:54:36, Garren Smith wrote: > >>>> I agree with Bob that page doesn't make sense as an endpoint. I'm al= so > >>>> rubbish with naming. The best I could come up with is replacing page= with > >>>> cursor - {db}/_all_docs/cursor or possibly {db}/_cursor/_all_docs > >>>> All the fields in the bookmark make sense except timestamp. Why woul= d it > >>>> matter if the timestamp is old? What happens if a node's time is an = hour > >>>> behind another node? > >>>> > >>>> > >>>>> On Thu, Apr 23, 2020 at 4:55 AM Ilya Khlopotov = wrote: > >>>>> > >>>>> - page is to provide some notion of progress for user > >>>>> - timestamp - I was thinking that we should drop requests if user w= ould > >>>>> try to pass bookmark created an hour ago. > >>>>> > >>>>> On 2020/04/22 21:58:40, Robert Samuel Newson w= rote: > >>>>>> "page" and "page number" are odd to me as these don't exist as con= cepts, > >>>>> I'd rather not invent them. I note there's no mention of page size,= which > >>>>> makes "page number" very vague. > >>>>>> > >>>>>> What is "timestamp" in the bookmark and what effect does it have w= hen > >>>>> the bookmark is passed back in? > >>>>>> > >>>>>> I guess, why does the bookmark include so much extraneous data? It= ems > >>>>> that are not needed to find the fdb key to begin the next response = from. > >>>>>> > >>>>>> > >>>>>>> On 22 Apr 2020, at 21:18, Ilya Khlopotov wrot= e: > >>>>>>> > >>>>>>> Hello everyone, > >>>>>>> > >>>>>>> Based on the discussions on the thread I would like to propose a > >>>>> number of first steps: > >>>>>>> 1) introduce new endpoints > >>>>>>> - {db}/_all_docs/page > >>>>>>> - {db}/_all_docs/queries/page > >>>>>>> - _all_dbs/page > >>>>>>> - _dbs_info/page > >>>>>>> - {db}/_design/{ddoc}/_view/{view}/page > >>>>>>> - {db}/_design/{ddoc}/_view/{view}/queries/page > >>>>>>> - {db}/_find/page > >>>>>>> > >>>>>>> These new endpoints would act as follows: > >>>>>>> - don't use delayed responses > >>>>>>> - return object with following structure > >>>>>>> ``` > >>>>>>> { > >>>>>>> "total": Total, > >>>>>>> "bookmark": base64 encoded opaque value, > >>>>>>> "completed": true | false, > >>>>>>> "update_seq": when available, > >>>>>>> "page": current page number, > >>>>>>> "items": [ > >>>>>>> ] > >>>>>>> } > >>>>>>> ``` > >>>>>>> - the bookmark would include following data (base64 or protobuff?= ??): > >>>>>>> - direction > >>>>>>> - page > >>>>>>> - descending > >>>>>>> - endkey > >>>>>>> - endkey_docid > >>>>>>> - inclusive_end > >>>>>>> - startkey > >>>>>>> - startkey_docid > >>>>>>> - last_key > >>>>>>> - update_seq > >>>>>>> - timestamp > >>>>>>> ``` > >>>>>>> > >>>>>>> 2) Implement per-endpoint configurable max limits > >>>>>>> ``` > >>>>>>> _all_docs =3D 5000 > >>>>>>> _all_docs/queries =3D 5000 > >>>>>>> _all_dbs =3D 5000 > >>>>>>> _dbs_info =3D 5000 > >>>>>>> _view =3D 2500 > >>>>>>> _view/queries =3D 2500 > >>>>>>> _find =3D 2500 > >>>>>>> ``` > >>>>>>> > >>>>>>> Latter (after few years) CouchDB would deprecate and remove old > >>>>> endpoints. > >>>>>>> > >>>>>>> Best regards, > >>>>>>> iilyak > >>>>>>> > >>>>>>> On 2020/02/19 22:39:45, Nick Vatamaniuc wro= te: > >>>>>>>> Hello everyone, > >>>>>>>> > >>>>>>>> I'd like to discuss the shape and behavior of streaming APIs for > >>>>> CouchDB 4.x > >>>>>>>> > >>>>>>>> By "streaming APIs" I mean APIs which stream data in row as it g= ets > >>>>>>>> read from the database. These are the endpoints I was thinking o= f: > >>>>>>>> > >>>>>>>> _all_docs, _all_dbs, _dbs_info and query results > >>>>>>>> > >>>>>>>> I want to focus on what happens when FoundationDB transactions > >>>>>>>> time-out after 5 seconds. Currently, all those APIs except _chan= ges[1] > >>>>>>>> feeds, will crash or freeze. The reason is because the > >>>>>>>> transaction_too_old error at the end of 5 seconds is retry-able = by > >>>>>>>> default, so the request handlers run again and end up shoving th= e > >>>>>>>> whole request down the socket again, headers and all, which is > >>>>>>>> obviously broken and not what we want. > >>>>>>>> > >>>>>>>> There are few alternatives discussed in couchdb-dev channel. I'l= l > >>>>>>>> present some behaviors but feel free to add more. Some ideas mig= ht > >>>>>>>> have been discounted on the IRC discussion already but I'll pres= ent > >>>>>>>> them anyway in case is sparks further conversation: > >>>>>>>> > >>>>>>>> A) Do what _changes[1] feeds do. Start a new transaction and con= tinue > >>>>>>>> streaming the data from the next key after last emitted in the > >>>>>>>> previous transaction. Document the API behavior change that it m= ay > >>>>>>>> present a view of the data is never a point-in-time[4] snapshot = of the > >>>>>>>> DB. > >>>>>>>> > >>>>>>>> - Keeps the API shape the same as CouchDB <4.0. Client libraries > >>>>>>>> don't have to change to continue using these CouchDB 4.0 endpoin= ts > >>>>>>>> - This is the easiest to implement since it would re-use the > >>>>>>>> implementation for _changes feed (an extra option passed to the = fold > >>>>>>>> function). > >>>>>>>> - Breaks API behavior if users relied on having a point-in-time[= 4] > >>>>>>>> snapshot view of the data. > >>>>>>>> > >>>>>>>> B) Simply end the stream. Let the users pass a `?transaction=3Dt= rue` > >>>>>>>> param which indicates they are aware the stream may end early an= d so > >>>>>>>> would have to paginate from the last emitted key with a skip=3D1= . This > >>>>>>>> will keep the request bodies the same as current CouchDB. Howeve= r, if > >>>>>>>> the users got all the data one request, they will end up wasting > >>>>>>>> another request to see if there is more data available. If they = didn't > >>>>>>>> get any data they might have a too large of a skip value (see [2= ]) so > >>>>>>>> would have to guess different values for start/end keys. Or impo= se max > >>>>>>>> limit for the `skip` parameter. > >>>>>>>> > >>>>>>>> C) End the stream and add a final metadata row like a "transacti= on": > >>>>>>>> "timeout" at the end. That will let the user know to keep pagina= ting > >>>>>>>> from the last key onward. This won't work for `_all_dbs` and > >>>>>>>> `_dbs_info`[3] Maybe let those two endpoints behave like _change= s > >>>>>>>> feeds and only use this for views and and _all_docs? If we like = this > >>>>>>>> choice, let's think what happens for those as I couldn't come up= with > >>>>>>>> anything decent there. > >>>>>>>> > >>>>>>>> D) Same as C but to solve the issue with skips[2], emit a bookma= rk > >>>>>>>> "key" of where the iteration stopped and the current "skip" and > >>>>>>>> "limit" params, which would keep decreasing. Then user would pas= s > >>>>>>>> those in "start_key=3D..." in the next request along with the li= mit and > >>>>>>>> skip params. So something like "continuation":{"skip":599, "limi= t":5, > >>>>>>>> "key":"..."}. This has the same issue with array results for > >>>>>>>> `_all_dbs` and `_dbs_info`[3]. > >>>>>>>> > >>>>>>>> E) Enforce low `limit` and `skip` parameters. Enforce maximum va= lues > >>>>>>>> there such that response time is likely to fit in one transactio= n. > >>>>>>>> This could be tricky as different runtime environments will have > >>>>>>>> different characteristics. Also, if the timeout happens there is= n't a > >>>>>>>> a nice way to send an HTTP error since we already sent the 200 > >>>>>>>> response. The downside is that this might break how some users u= se the > >>>>>>>> API, if say the are using large skips and limits already. Perhap= s here > >>>>>>>> we do both B and D, such that if users want transactional behavi= or, > >>>>>>>> they specify that `transaction=3Dtrue` param and only then we en= force > >>>>>>>> low limit and skip maximums. > >>>>>>>> > >>>>>>>> F) At least for `_all_docs` it seems providing a point-in-time > >>>>>>>> snapshot view doesn't necessarily need to be tied to transaction > >>>>>>>> boundaries. We could check the update sequence of the database a= t the > >>>>>>>> start of the next transaction and if it hasn't changed we can co= ntinue > >>>>>>>> emitting a consistent view. This can apply to C and D and would = just > >>>>>>>> determine when the stream ends. If there are no writes happening= to > >>>>>>>> the db, this could potential streams all the data just like opti= on A > >>>>>>>> would do. Not entirely sure if this would work for views. > >>>>>>>> > >>>>>>>> So what do we think? I can see different combinations of options= here, > >>>>>>>> maybe even different for each API point. For example `_all_dbs`, > >>>>>>>> `_dbs_info` are always A, and `_all_docs` and views default to A= but > >>>>>>>> have parameters to do F, etc. > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> -Nick > >>>>>>>> > >>>>>>>> Some footnotes: > >>>>>>>> > >>>>>>>> [1] _changes feeds is the only one that works currently. It beha= ves as > >>>>>>>> per RFC > >>>>> https://github.com/apache/couchdb-documentation/blob/master/rfcs/00= 3-fdb-seq-index.md#access-patterns > >>>>> . > >>>>>>>> That is, we continue streaming the data by resetting the transac= tion > >>>>>>>> object and restarting from the last emitted key (db sequence in = this > >>>>>>>> case). However, because the transaction restarts if a document i= s > >>>>>>>> updated while the streaming take place, it may appear in the _ch= anges > >>>>>>>> feed twice. That's a behavior difference from CouchDB < 4.0 and = we'd > >>>>>>>> have to document it, since previously we presented this point-in= -time > >>>>>>>> snapshot of the database from when we started streaming. > >>>>>>>> > >>>>>>>> [2] Our streaming APIs have both skips and limits. Since FDB doe= sn't > >>>>>>>> currently support efficient offsets for key selectors > >>>>>>>> ( > >>>>> https://apple.github.io/foundationdb/known-limitations.html#dont-us= e-key-selectors-for-paging > >>>>> ) > >>>>>>>> we implemented skip by iterating over the data. This means that = a skip > >>>>>>>> of say 100000 could keep timing out the transaction without yiel= ding > >>>>>>>> any data. > >>>>>>>> > >>>>>>>> [3] _all_dbs and _dbs_info return a JSON array so they don't hav= e an > >>>>>>>> obvious place to insert a last metadata row. > >>>>>>>> > >>>>>>>> [4] For example they have a constraint that documents "a" and "z= " > >>>>>>>> cannot both be in the database at the same time. But when iterat= ing > >>>>>>>> it's possible that "a" was there at the start. Then by the end, = "a" > >>>>>>>> was removed and "z" added, so both "a" and "z" would appear in t= he > >>>>>>>> emitted stream. Note that FoundationDB has APIs which exhibit th= e same > >>>>>>>> "relaxed" constrains: > >>>>>>>> > >>>>> https://apple.github.io/foundationdb/api-python.html#module-fdb.loc= ality > >>>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> >