From dev-return-49274-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Mon Apr 27 15:54:28 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id B2FB4180637 for ; Mon, 27 Apr 2020 17:54:27 +0200 (CEST) Received: (qmail 91758 invoked by uid 500); 27 Apr 2020 15:54:27 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 91745 invoked by uid 99); 27 Apr 2020 15:54:26 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Apr 2020 15:54:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id EEE831A3319 for ; Mon, 27 Apr 2020 15:54:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.3 X-Spam-Level: X-Spam-Status: No, score=0.3 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id XJdNrTbgbJtZ for ; Mon, 27 Apr 2020 15:54:23 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.221.180; helo=mail-vk1-f180.google.com; envelope-from=vatamane@gmail.com; receiver= Received: from mail-vk1-f180.google.com (mail-vk1-f180.google.com [209.85.221.180]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 70E5CBB901 for ; Mon, 27 Apr 2020 15:54:23 +0000 (UTC) Received: by mail-vk1-f180.google.com with SMTP id w68so1651487vke.5 for ; Mon, 27 Apr 2020 08:54:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=DexVdhYSAjDx7uPBhp3SbPduVonfK4oE/+l1elZdeH8=; b=n2qx/Me7xGb5Rir3zO3k8s4zy+LmhSQmpN+vx9WZyaVIZVF+2RtCsL74t+apSwZHS5 ksOxxFMGE6ZkhgvFtH4zhKtKcm93Bidb5RebRrUGYZXtaqrM9RmF8nyfvtxDFck0gvSK wzIaYOQRIP/73NkDc5rYfdALr2k2JG9yVkS0aRW91rxf7d3uyG5qrIHK2yZgx9ZQxcSG mpBu1uF4bWSpt4/iQOWImYOfyVgY9lAsIDDTfB/OwCza5R34L78QFzJ2oKb9uV6t4X+9 KNg9HII0ZzZz0vgtdpg/rr4yEZtsOyNReOkslghTCZmxnDbiQcpcYtzuzo4hz70zJVnv v5Xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=DexVdhYSAjDx7uPBhp3SbPduVonfK4oE/+l1elZdeH8=; b=InXy0PGuW+upglzGD5hgevpUQm+CDn0/XnOFp1kaKNH42r5FbGcGa3A1gitnSFpfQb FR1u+E6TB21vvGUhUYMv7NstL3Sd0g7dD3O8nVn0v2e+tONBWJcOg3GHpUdNxOQyDBvA O43myKyXPeMdMAIwptjDOzWCU5pfq5Dcf2fR8uyHhHparYKOl6Xn6bZeoTsILrvwVAfS LZXBzSls3gdy+QpyX0Czm/AuA5e/KhKBDlvxc9MiRxWEn9ztx2cZf3xelI3clG6K732h WdgOY5A69pAo8/uqyftnYU9JFRuZTI3MWCxRQM+Y3a5C+Lz8pXSCx6xRw4sqrSNnWzJy bNwQ== X-Gm-Message-State: AGi0PuY6l8yH2aKDd0fAg91t6sZogr1WQ4Yi6h3f6oV3+0SmJpLDgVwU JUSFSr/Mo0ZGbKIuTShJfOCceqej7qlJmdcR2ZGuYG+1 X-Google-Smtp-Source: APiQypK/pMvi9oaEV/+59AHR2bBE4RE0z+OnAhOLaIlle/F921vvocdSuTzGk+wtk7Gyc0Ja41aAZDxN19S3qBLIEn0= X-Received: by 2002:a1f:9cc8:: with SMTP id f191mr17260030vke.68.1588002862450; Mon, 27 Apr 2020 08:54:22 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Nick Vatamaniuc Date: Mon, 27 Apr 2020 11:54:11 -0400 Message-ID: Subject: Re: [DISCUSS] Streaming API in CouchDB 4.0 To: dev@couchdb.apache.org Content-Type: text/plain; charset="UTF-8" It's good to see more activity in the thread. I thought everyone had lost interest :-) Nice work, Ilya, on the prototype. I think you picked what I had initially called option D and E. With the exception that we don't force clients to specify a limit when a max limit is configured in the settings. API versioning idea in principle sounds good, but can't think of a clean way to do it. /_v2/_all_dbs pattern might work, but what would it look like for views, perhaps /{db}/_design/{ddoc}/_v2/_view/{view} is not that bad? The idea, I guess, would be to allow users the option to use the old APIs to migrate their application to 4.x without having to worry about rewriting everything in addition to having to configure and maintain new FDB backend. Also if we go the API versioning route, I like the uniformity of using {"items":[...]} in the response instead of {"rows": []}, or {"docs":[]}. Also +1 on adding _changes in there as well. On the other hand, if we wanted to keep things as compatible as possible, we could have a new parameter like `&use_cursor=true` for view-like endpoints (excluding _all_dbs, etc), and, if that is specified the user should expect the response to not return all the rows sometimes and then look for a "cursor"/"bookmark" field in the response. We could have an option to enforce `&use_cursor=true` usage on some API points, kind of in the same vein as using a max limit (option E). The point would be that user would explicitly acknowledge that they are expecting this behavior. Otherwise I don't think it is a good idea to keep the shape of the response the same, and only emit an extra bookmark/cursor and then skip some of the rows from the results. Regarding not using delayed response and having configurable limits. I think we'd have to see how those interact with each other. Can users still set set _all_docs limit = infinity and get the current behavior? They can currently (even on the prototype/fdb-layer branch). Sometimes it is useful to stream all the data at once, say when doing a backup, or other maintenance. Using a proper client with a cursor helper, which hides the iteration loop, or a custom shell script would work, but it might be nice to retain the option to be able to do it straight from the API. There was some discussion whether we should include more or less fields in the bookmark, timestamp might have issues with time skew, but we could include the instance id (uuid) of the database. We do that with the shard uuids currently in the update sequence. We don't have to use the whole uuid but maybe just the 5-6 bytes to ensure that if database is recreated and users are "streaming" data from it they don't all of the sudden end up from a completely different db instance. Including the update_seq is also interesting, but I think the discussion so points to not do it in the first version, though it would allow a user to know if, say, they backed up consistent point in time snapshot of the database using a sequence of cursor requests. I am also +0 on returning "completed: true", I kind of like the idea. Maybe we could just return a null cursor then without using yet another field. "If you have a cursor you can iterate until the end, if you don't you're done" kind of idea? Cheers, -Nick On Wed, Apr 22, 2020 at 4:19 PM Ilya Khlopotov wrote: > > Hello everyone, > > Based on the discussions on the thread I would like to propose a number of first steps: > 1) introduce new endpoints > - {db}/_all_docs/page > - {db}/_all_docs/queries/page > - _all_dbs/page > - _dbs_info/page > - {db}/_design/{ddoc}/_view/{view}/page > - {db}/_design/{ddoc}/_view/{view}/queries/page > - {db}/_find/page > > These new endpoints would act as follows: > - don't use delayed responses > - return object with following structure > ``` > { > "total": Total, > "bookmark": base64 encoded opaque value, > "completed": true | false, > "update_seq": when available, > "page": current page number, > "items": [ > ] > } > ``` > - the bookmark would include following data (base64 or protobuff???): > - direction > - page > - descending > - endkey > - endkey_docid > - inclusive_end > - startkey > - startkey_docid > - last_key > - update_seq > - timestamp > ``` > > 2) Implement per-endpoint configurable max limits > ``` > _all_docs = 5000 > _all_docs/queries = 5000 > _all_dbs = 5000 > _dbs_info = 5000 > _view = 2500 > _view/queries = 2500 > _find = 2500 > ``` > > Latter (after few years) CouchDB would deprecate and remove old endpoints. > > Best regards, > iilyak > > On 2020/02/19 22:39:45, Nick Vatamaniuc wrote: > > Hello everyone, > > > > I'd like to discuss the shape and behavior of streaming APIs for CouchDB 4.x > > > > By "streaming APIs" I mean APIs which stream data in row as it gets > > read from the database. These are the endpoints I was thinking of: > > > > _all_docs, _all_dbs, _dbs_info and query results > > > > I want to focus on what happens when FoundationDB transactions > > time-out after 5 seconds. Currently, all those APIs except _changes[1] > > feeds, will crash or freeze. The reason is because the > > transaction_too_old error at the end of 5 seconds is retry-able by > > default, so the request handlers run again and end up shoving the > > whole request down the socket again, headers and all, which is > > obviously broken and not what we want. > > > > There are few alternatives discussed in couchdb-dev channel. I'll > > present some behaviors but feel free to add more. Some ideas might > > have been discounted on the IRC discussion already but I'll present > > them anyway in case is sparks further conversation: > > > > A) Do what _changes[1] feeds do. Start a new transaction and continue > > streaming the data from the next key after last emitted in the > > previous transaction. Document the API behavior change that it may > > present a view of the data is never a point-in-time[4] snapshot of the > > DB. > > > > - Keeps the API shape the same as CouchDB <4.0. Client libraries > > don't have to change to continue using these CouchDB 4.0 endpoints > > - This is the easiest to implement since it would re-use the > > implementation for _changes feed (an extra option passed to the fold > > function). > > - Breaks API behavior if users relied on having a point-in-time[4] > > snapshot view of the data. > > > > B) Simply end the stream. Let the users pass a `?transaction=true` > > param which indicates they are aware the stream may end early and so > > would have to paginate from the last emitted key with a skip=1. This > > will keep the request bodies the same as current CouchDB. However, if > > the users got all the data one request, they will end up wasting > > another request to see if there is more data available. If they didn't > > get any data they might have a too large of a skip value (see [2]) so > > would have to guess different values for start/end keys. Or impose max > > limit for the `skip` parameter. > > > > C) End the stream and add a final metadata row like a "transaction": > > "timeout" at the end. That will let the user know to keep paginating > > from the last key onward. This won't work for `_all_dbs` and > > `_dbs_info`[3] Maybe let those two endpoints behave like _changes > > feeds and only use this for views and and _all_docs? If we like this > > choice, let's think what happens for those as I couldn't come up with > > anything decent there. > > > > D) Same as C but to solve the issue with skips[2], emit a bookmark > > "key" of where the iteration stopped and the current "skip" and > > "limit" params, which would keep decreasing. Then user would pass > > those in "start_key=..." in the next request along with the limit and > > skip params. So something like "continuation":{"skip":599, "limit":5, > > "key":"..."}. This has the same issue with array results for > > `_all_dbs` and `_dbs_info`[3]. > > > > E) Enforce low `limit` and `skip` parameters. Enforce maximum values > > there such that response time is likely to fit in one transaction. > > This could be tricky as different runtime environments will have > > different characteristics. Also, if the timeout happens there isn't a > > a nice way to send an HTTP error since we already sent the 200 > > response. The downside is that this might break how some users use the > > API, if say the are using large skips and limits already. Perhaps here > > we do both B and D, such that if users want transactional behavior, > > they specify that `transaction=true` param and only then we enforce > > low limit and skip maximums. > > > > F) At least for `_all_docs` it seems providing a point-in-time > > snapshot view doesn't necessarily need to be tied to transaction > > boundaries. We could check the update sequence of the database at the > > start of the next transaction and if it hasn't changed we can continue > > emitting a consistent view. This can apply to C and D and would just > > determine when the stream ends. If there are no writes happening to > > the db, this could potential streams all the data just like option A > > would do. Not entirely sure if this would work for views. > > > > So what do we think? I can see different combinations of options here, > > maybe even different for each API point. For example `_all_dbs`, > > `_dbs_info` are always A, and `_all_docs` and views default to A but > > have parameters to do F, etc. > > > > Cheers, > > -Nick > > > > Some footnotes: > > > > [1] _changes feeds is the only one that works currently. It behaves as > > per RFC https://github.com/apache/couchdb-documentation/blob/master/rfcs/003-fdb-seq-index.md#access-patterns. > > That is, we continue streaming the data by resetting the transaction > > object and restarting from the last emitted key (db sequence in this > > case). However, because the transaction restarts if a document is > > updated while the streaming take place, it may appear in the _changes > > feed twice. That's a behavior difference from CouchDB < 4.0 and we'd > > have to document it, since previously we presented this point-in-time > > snapshot of the database from when we started streaming. > > > > [2] Our streaming APIs have both skips and limits. Since FDB doesn't > > currently support efficient offsets for key selectors > > (https://apple.github.io/foundationdb/known-limitations.html#dont-use-key-selectors-for-paging) > > we implemented skip by iterating over the data. This means that a skip > > of say 100000 could keep timing out the transaction without yielding > > any data. > > > > [3] _all_dbs and _dbs_info return a JSON array so they don't have an > > obvious place to insert a last metadata row. > > > > [4] For example they have a constraint that documents "a" and "z" > > cannot both be in the database at the same time. But when iterating > > it's possible that "a" was there at the start. Then by the end, "a" > > was removed and "z" added, so both "a" and "z" would appear in the > > emitted stream. Note that FoundationDB has APIs which exhibit the same > > "relaxed" constrains: > > https://apple.github.io/foundationdb/api-python.html#module-fdb.locality > >