couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: [DISCUSS] Streaming API in CouchDB 4.0
Date Thu, 23 Apr 2020 21:15:05 GMT
I'd agree that my initial reaction to cursor was that its not a great
fit, but there does seem to be the existing name used in the greater
REST world for this sort of pagination so I'm not concerned about
using that terminology.

I'm generally on board with allowing and setting some default sane
limits on pages. We probably should have done that quite awhile ago
after moving to native clustering and now that we have FDB limits I
think it makes even more sense to have an API that does not lend
itself to crazy errors when people are just trying to poke at an API.

I think we're all on board that one of the goals is to make sure that
clients don't accidentally misinterpret a response. That is, we're
trying to be quite diligent that a user doesn't get 1000 rows and not
realize there's another 10 that were beyond the limit. The bookmark
approach with hard caps seems like a generally fine approach to me.
The current approach users extra URL path segments to try and avoid
this confusion. I wonder if we should consider starting to properly
version our API using one of the many schemes that are used. Having
read through a few articles I don't have a very clear favorite though.

As to this particular proposal I do see a couple issues:

`total` - We can do this in most cases fairly easily. Though it's a
bit odd for continuous changes.

`complete` - I'm not sure whether this is entirely possible given the
API that FDB presents us. Specifically, when we set a range and we get
back exactly $num_rows in the response, if the data set ended at
exactly that page I don't think the `more` flag from fdb would tell us
that. So we'd have a clunky UX there where we say not complete but the
next page is empty. That's also not to mention that depending on
whether we're looking at snapshots and so on that there's no way for
us to know between stateless requests whether there were more rows
added to the end.

`page` - This one is just hard/impossible to calculate. FDB doesn't
provide us with offsets or even an efficient "about how many rows in
this range?" type queries so providing that would be both inaccurate
and fairly difficult/expensive to calculate. In some cases I think we
could have something maybe close that didn't suck too badly, but it'd
also fall down for changes as well due to the way that updates reorder
them.

`update_seq` - I'm just not sure on when this would be useful or what
it would refer to. Maybe a version stamp of the last change for that
request? If we had a future API that asked for a snapshot access then
maybe? But if we did do something there with versionstamps or read
versions I'd expect that to come with the rest of the API.

For the bookmark fields:

`direction` vs `descending` seems like a field duplication to me.

`page` - This would seem to suggest we could skip to a certain
location in the results numerically which we are not able to do with
the FDB API.

`last_key` vs `start_key` seems like a field duplication. We don't
need to know where things started I don't think. Just where to start
from and where to end.

`update_seq` - is same as earlier. Not entirely sure on the intent there.

`timestamp` - Expiring bookmarks based on time does not seem like a
good idea. Both for clock skew and why bother when this would
functionally just be a convenience API that users could already
implement for themselves.

Another thing might also be to provide our bookmark as a full link
that seems to be fairly standard REST practice these days. Something
that clients don't have to do any logic with so that we're free to
change the implementation.

And lastly, I don't think we should be neglecting the _changes API as
part of this discussion. I realize that we'll need to support the
older streaming semantics if we want to maintain replication
compatibility (which I think we'll all agree is a Good Thing) but it
also feels a bit wrong to ignore it as part of this work if we're
going to be modernizing our APIs. Though if we do pick up a good
versioning scheme then we could theoretically make those changes
easily enough. Plus, who doesn't want to rewrite chttpd to be a whole
lot less... chttpd-y?


On Thu, Apr 23, 2020 at 1:43 PM Robert Samuel Newson <rnewson@apache.org> wrote:
>
>
> I think it's a key difference from "cursor" as I've seen them elsewhere, that ours will
point at an ever changing database, you couldn't seamlessly cursor through a large data set,
one "page" at a time.
>
> Bookmarks began in search (raises guilty hand) in order to address a Lucene-specific
issue (that high values of "skip" are incredibly inefficient, using lots of RAM). That is
not true for CouchDB's own indexes, which can be navigated perfectly with startkey/endkey/startkey_docid/endkey_docid,
etc.
>
> I guess I'm not helping much with these observations but I wouldn't like to see CouchDB
gain an additional and ugly method of doing something already possible.
>
> B.
>
> > On 23 Apr 2020, at 19:02, Joan Touzet <wohali@apache.org> wrote:
> >
> > I realise this is bikeshedding, but I guess that's kind of the point... Everything
below is my opinion, not "fact."
> >
> > It's unfortunate we need a new endpoint for all of this. In a vacuum I might have
just suggested we use the semantics we already have, perhaps with ?from= instead of ?since=
.
> >
> > "page" only works if the size of a page is well known, either by server preference
or directly in the URL. If I ask for:
> >
> >  GET /{db}/_all_docs?limit=20&page=3
> >
> > I know that I'm always going to get document 41 through 60 in the default collation
order.
> >
> > There's a *fantastic* summary of examples from popular REST APIs here:
> >
> > https://medium.com/@ignaciochiazzo/paginating-requests-in-apis-d4883d4c1c4c
> >
> > We are *pretty close* to what a cursor means in those other examples, except for
the fact that our cursor can go stale/invalid after a short time.
> >
> > Bob, could you be a bit more detailed in your explanation how our definition isn't
close to these? Or did you mean SQL CURSOR (which is something entirely different?) If so,
I'm fine with this being a REST API cursor - something clearly distinct.
> >
> > I come back to wanting to preserve the existing endpoint syntax and naming, without
new endpoints, but specifying this new FDB token via ?cursor= and this being the trigger for
the new behaviour. At some point, we simply stop accepting ?since= tokens. This seems inline
with other popular REST APIs.
> >
> > -Joan "still sick and not sleeping right" Touzet
> >
> >
> > On 2020-04-23 12:30, Robert Newson wrote:
> >> cursor has established meaning in other databases and ours would not be very
close to them. I don’t think it’s a good idea.
> >> B.
> >>> On 23 Apr 2020, at 11:50, Ilya Khlopotov <iilyak@apache.org> wrote:
> >>>
> >>> 
> >>>>
> >>>> The best I could come up with is replacing page with
> >>>> cursor - {db}/_all_docs/cursor or possibly {db}/_cursor/_all_docs
> >>> Good idea, I like {db}/_all_docs/cursor (or {db}/_all_docs/_cursor).
> >>>
> >>>> On 2020/04/23 08:54:36, Garren Smith <garren@apache.org> wrote:
> >>>> I agree with Bob that page doesn't make sense as an endpoint. I'm also
> >>>> rubbish with naming. The best I could come up with is replacing page
with
> >>>> cursor - {db}/_all_docs/cursor or possibly {db}/_cursor/_all_docs
> >>>> All the fields in the bookmark make sense except timestamp. Why would
it
> >>>> matter if the timestamp is old? What happens if a node's time is an
hour
> >>>> behind another node?
> >>>>
> >>>>
> >>>>> On Thu, Apr 23, 2020 at 4:55 AM Ilya Khlopotov <iilyak@apache.org>
wrote:
> >>>>>
> >>>>> - page is to provide some notion of progress for user
> >>>>> - timestamp - I was thinking that we should drop requests if user
would
> >>>>> try to pass bookmark created an hour ago.
> >>>>>
> >>>>> On 2020/04/22 21:58:40, Robert Samuel Newson <rnewson@apache.org>
wrote:
> >>>>>> "page" and "page number" are odd to me as these don't exist
as concepts,
> >>>>> I'd rather not invent them. I note there's no mention of page size,
which
> >>>>> makes "page number" very vague.
> >>>>>>
> >>>>>> What is "timestamp" in the bookmark and what effect does it
have when
> >>>>> the bookmark is passed back in?
> >>>>>>
> >>>>>> I guess, why does the bookmark include so much extraneous data?
Items
> >>>>> that are not needed to find the fdb key to begin the next response
from.
> >>>>>>
> >>>>>>
> >>>>>>> On 22 Apr 2020, at 21:18, Ilya Khlopotov <iilyak@apache.org>
wrote:
> >>>>>>>
> >>>>>>> Hello everyone,
> >>>>>>>
> >>>>>>> Based on the discussions on the thread I would like to propose
a
> >>>>> number of first steps:
> >>>>>>> 1) introduce new endpoints
> >>>>>>> - {db}/_all_docs/page
> >>>>>>> - {db}/_all_docs/queries/page
> >>>>>>> - _all_dbs/page
> >>>>>>> - _dbs_info/page
> >>>>>>> - {db}/_design/{ddoc}/_view/{view}/page
> >>>>>>> - {db}/_design/{ddoc}/_view/{view}/queries/page
> >>>>>>> - {db}/_find/page
> >>>>>>>
> >>>>>>> These new endpoints would act as follows:
> >>>>>>> - don't use delayed responses
> >>>>>>> - return object with following structure
> >>>>>>> ```
> >>>>>>> {
> >>>>>>>    "total": Total,
> >>>>>>>    "bookmark": base64 encoded opaque value,
> >>>>>>>    "completed": true | false,
> >>>>>>>    "update_seq": when available,
> >>>>>>>    "page": current page number,
> >>>>>>>    "items": [
> >>>>>>>    ]
> >>>>>>> }
> >>>>>>> ```
> >>>>>>> - the bookmark would include following data (base64 or protobuff???):
> >>>>>>> - direction
> >>>>>>> - page
> >>>>>>> - descending
> >>>>>>> - endkey
> >>>>>>> - endkey_docid
> >>>>>>> - inclusive_end
> >>>>>>> - startkey
> >>>>>>> - startkey_docid
> >>>>>>> - last_key
> >>>>>>> - update_seq
> >>>>>>> - timestamp
> >>>>>>> ```
> >>>>>>>
> >>>>>>> 2) Implement per-endpoint configurable max limits
> >>>>>>> ```
> >>>>>>> _all_docs = 5000
> >>>>>>> _all_docs/queries = 5000
> >>>>>>> _all_dbs = 5000
> >>>>>>> _dbs_info = 5000
> >>>>>>> _view = 2500
> >>>>>>> _view/queries = 2500
> >>>>>>> _find = 2500
> >>>>>>> ```
> >>>>>>>
> >>>>>>> Latter (after few years) CouchDB would deprecate and remove
old
> >>>>> endpoints.
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> iilyak
> >>>>>>>
> >>>>>>> On 2020/02/19 22:39:45, Nick Vatamaniuc <vatamane@apache.org>
wrote:
> >>>>>>>> Hello everyone,
> >>>>>>>>
> >>>>>>>> I'd like to discuss the shape and behavior of streaming
APIs for
> >>>>> CouchDB 4.x
> >>>>>>>>
> >>>>>>>> By "streaming APIs" I mean APIs which stream data in
row as it gets
> >>>>>>>> read from the database. These are the endpoints I was
thinking of:
> >>>>>>>>
> >>>>>>>> _all_docs, _all_dbs, _dbs_info  and query results
> >>>>>>>>
> >>>>>>>> I want to focus on what happens when FoundationDB transactions
> >>>>>>>> time-out after 5 seconds. Currently, all those APIs
except _changes[1]
> >>>>>>>> feeds, will crash or freeze. The reason is because the
> >>>>>>>> transaction_too_old error at the end of 5 seconds is
retry-able by
> >>>>>>>> default, so the request handlers run again and end up
shoving the
> >>>>>>>> whole request down the socket again, headers and all,
which is
> >>>>>>>> obviously broken and not what we want.
> >>>>>>>>
> >>>>>>>> There are few alternatives discussed in couchdb-dev
channel. I'll
> >>>>>>>> present some behaviors but feel free to add more. Some
ideas might
> >>>>>>>> have been discounted on the IRC discussion already but
I'll present
> >>>>>>>> them anyway in case is sparks further conversation:
> >>>>>>>>
> >>>>>>>> A) Do what _changes[1] feeds do. Start a new transaction
and continue
> >>>>>>>> streaming the data from the next key after last emitted
in the
> >>>>>>>> previous transaction. Document the API behavior change
that it may
> >>>>>>>> present a view of the data is never a point-in-time[4]
snapshot of the
> >>>>>>>> DB.
> >>>>>>>>
> >>>>>>>> - Keeps the API shape the same as CouchDB <4.0. Client
libraries
> >>>>>>>> don't have to change to continue using these CouchDB
4.0 endpoints
> >>>>>>>> - This is the easiest to implement since it would re-use
the
> >>>>>>>> implementation for _changes feed (an extra option passed
to the fold
> >>>>>>>> function).
> >>>>>>>> - Breaks API behavior if users relied on having a point-in-time[4]
> >>>>>>>> snapshot view of the data.
> >>>>>>>>
> >>>>>>>> B) Simply end the stream. Let the users pass a `?transaction=true`
> >>>>>>>> param which indicates they are aware the stream may
end early and so
> >>>>>>>> would have to paginate from the last emitted key with
a skip=1. This
> >>>>>>>> will keep the request bodies the same as current CouchDB.
However, if
> >>>>>>>> the users got all the data one request, they will end
up wasting
> >>>>>>>> another request to see if there is more data available.
If they didn't
> >>>>>>>> get any data they might have a too large of a skip value
(see [2]) so
> >>>>>>>> would have to guess different values for start/end keys.
Or impose max
> >>>>>>>> limit for the `skip` parameter.
> >>>>>>>>
> >>>>>>>> C) End the stream and add a final metadata row like
a "transaction":
> >>>>>>>> "timeout" at the end. That will let the user know to
keep paginating
> >>>>>>>> from the last key onward. This won't work for `_all_dbs`
and
> >>>>>>>> `_dbs_info`[3] Maybe let those two endpoints behave
like _changes
> >>>>>>>> feeds and only use this for views and and _all_docs?
If we like this
> >>>>>>>> choice, let's think what happens for those as I couldn't
come up with
> >>>>>>>> anything decent there.
> >>>>>>>>
> >>>>>>>> D) Same as C but to solve the issue with skips[2], emit
a bookmark
> >>>>>>>> "key" of where the iteration stopped and the current
"skip" and
> >>>>>>>> "limit" params, which would keep decreasing. Then user
would pass
> >>>>>>>> those in "start_key=..." in the next request along with
the limit and
> >>>>>>>> skip params. So something like "continuation":{"skip":599,
"limit":5,
> >>>>>>>> "key":"..."}. This has the same issue with array results
for
> >>>>>>>> `_all_dbs` and `_dbs_info`[3].
> >>>>>>>>
> >>>>>>>> E) Enforce low `limit` and `skip` parameters. Enforce
maximum values
> >>>>>>>> there such that response time is likely to fit in one
transaction.
> >>>>>>>> This could be tricky as different runtime environments
will have
> >>>>>>>> different characteristics. Also, if the timeout happens
there isn't a
> >>>>>>>> a nice way to send an HTTP error since we already sent
the 200
> >>>>>>>> response. The downside is that this might break how
some users use the
> >>>>>>>> API, if say the are using large skips and limits already.
Perhaps here
> >>>>>>>> we do both B and D, such that if users want transactional
behavior,
> >>>>>>>> they specify that `transaction=true` param and only
then we enforce
> >>>>>>>> low limit and skip maximums.
> >>>>>>>>
> >>>>>>>> F) At least for `_all_docs` it seems providing a point-in-time
> >>>>>>>> snapshot view doesn't necessarily need to be tied to
transaction
> >>>>>>>> boundaries. We could check the update sequence of the
database at the
> >>>>>>>> start of the next transaction and if it hasn't changed
we can continue
> >>>>>>>> emitting a consistent view. This can apply to C and
D and would just
> >>>>>>>> determine when the stream ends. If there are no writes
happening to
> >>>>>>>> the db, this could potential streams all the data just
like option A
> >>>>>>>> would do. Not entirely sure if this would work for views.
> >>>>>>>>
> >>>>>>>> So what do we think? I can see different combinations
of options here,
> >>>>>>>> maybe even different for each API point. For example
`_all_dbs`,
> >>>>>>>> `_dbs_info` are always A, and `_all_docs` and views
default to A but
> >>>>>>>> have parameters to do F, etc.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> -Nick
> >>>>>>>>
> >>>>>>>> Some footnotes:
> >>>>>>>>
> >>>>>>>> [1] _changes feeds is the only one that works currently.
It behaves as
> >>>>>>>> per RFC
> >>>>> https://github.com/apache/couchdb-documentation/blob/master/rfcs/003-fdb-seq-index.md#access-patterns
> >>>>> .
> >>>>>>>> That is, we continue streaming the data by resetting
the transaction
> >>>>>>>> object and restarting from the last emitted key (db
sequence in this
> >>>>>>>> case). However, because the transaction restarts if
a document is
> >>>>>>>> updated while the streaming take place, it may appear
in the _changes
> >>>>>>>> feed twice. That's a behavior difference from CouchDB
< 4.0 and we'd
> >>>>>>>> have to document it, since previously we presented this
point-in-time
> >>>>>>>> snapshot of the database from when we started streaming.
> >>>>>>>>
> >>>>>>>> [2] Our streaming APIs have both skips and limits. Since
FDB doesn't
> >>>>>>>> currently support efficient offsets for key selectors
> >>>>>>>> (
> >>>>> https://apple.github.io/foundationdb/known-limitations.html#dont-use-key-selectors-for-paging
> >>>>> )
> >>>>>>>> we implemented skip by iterating over the data. This
means that a skip
> >>>>>>>> of say 100000 could keep timing out the transaction
without yielding
> >>>>>>>> any data.
> >>>>>>>>
> >>>>>>>> [3] _all_dbs and _dbs_info return a JSON array so they
don't have an
> >>>>>>>> obvious place to insert a last metadata row.
> >>>>>>>>
> >>>>>>>> [4] For example they have a constraint that documents
"a" and "z"
> >>>>>>>> cannot both be in the database at the same time. But
when iterating
> >>>>>>>> it's possible that "a" was there at the start. Then
by the end, "a"
> >>>>>>>> was removed and "z" added, so both "a" and "z" would
appear in the
> >>>>>>>> emitted stream. Note that FoundationDB has APIs which
exhibit the same
> >>>>>>>> "relaxed" constrains:
> >>>>>>>>
> >>>>> https://apple.github.io/foundationdb/api-python.html#module-fdb.locality
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
>

Mime
View raw message