From dev-return-49363-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Wed May 13 14:07:49 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 3E1BE18064A for ; Wed, 13 May 2020 16:07:49 +0200 (CEST) Received: (qmail 30265 invoked by uid 500); 13 May 2020 14:07:48 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 30253 invoked by uid 99); 13 May 2020 14:07:48 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 May 2020 14:07:48 +0000 Received: from auth2-smtp.messagingengine.com (auth2-smtp.messagingengine.com [66.111.4.228]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 2C1E64FD6 for ; Wed, 13 May 2020 14:07:48 +0000 (UTC) Received: from compute7.internal (compute7.nyi.internal [10.202.2.47]) by mailauth.nyi.internal (Postfix) with ESMTP id EE60B27C0058 for ; Wed, 13 May 2020 10:07:47 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute7.internal (MEProxy); Wed, 13 May 2020 10:07:47 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduhedrleeggdejvdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhephfgtgfgguffffhfvjgfkofesthhqmh dthhdtjeenucfhrhhomheptfhosggvrhhtucfurghmuhgvlhcupfgvfihsohhnuceorhhn vgifshhonhesrghprggthhgvrdhorhhgqeenucggtffrrghtthgvrhhnpedujeeifeeghf efteevleeggedvvdeijeeukedufeefjeekhfekfeejhedtueeufeenucfkphepkeelrddv feekrdduheegrdduieejnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrg hilhhfrhhomheprhhnvgifshhonhdomhgvshhmthhprghuthhhphgvrhhsohhnrghlihht hidqleefgedvtddvjedvqdduudelgeejtdejjedqrhhnvgifshhonheppegrphgrtghhvg drohhrghesfhgrshhtmhgrihhlrdhfmh X-ME-Proxy: Received: from [10.200.82.85] (unknown [89.238.154.167]) by mail.messagingengine.com (Postfix) with ESMTPA id 7C8C430662FC for ; Wed, 13 May 2020 10:07:47 -0400 (EDT) From: Robert Samuel Newson Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) Subject: Re: [DISCUSS] _changes feed on database partitions Date: Wed, 13 May 2020 15:07:46 +0100 References: To: CouchDB Developers In-Reply-To: Message-Id: <3F63DB09-6CB6-41E1-8C2A-1EAE3406CE36@apache.org> X-Mailer: Apple Mail (2.3608.80.23.2.2) Hi, Yes, I think this would be a good addition for 3.0. I think we didn't = add it before because of concerns of accidental misuse (attempting to = replicate with it but forgetting a range, etc)? Whatever the reasons, I = think exposing the per-partition _changes feed exactly as you've = described will be useful. We should state explicitly in the accompanying = docs that the replicator does not use this endpoint (though, of course, = it might be enhanced to do so in a future release). =46rom 4.0 onward, there's a discussion elsewhere on whether any of the = _partition endpoints continue to exist (leaning towards keeping them = just to avoid unnecessary upgrade pain?), so a note in that thread would = be good too. It does seem odd to enhance an endpoint in 3.0 to then = remove it entirely in 4.0. The reasons for removing _partition are = compelling however, as the motivating (internal) reason for introducing = _partition is gone. B. > On 12 May 2020, at 22:59, Adam Kocoloski wrote: >=20 > Hi all, >=20 > When we introduced partitioned databases in 3.0 we declined to add a = partition-specific _changes endpoint, because we didn=E2=80=99t have a = prebuilt index that could support it. It sounds like the lack of that = endpoint is a bit of a drag. I wanted to start this thread to consider = adding it. >=20 > Note: this isn=E2=80=99t a fully-formed proposal coming from my team = with a plan to staff the development of it. Just a discussion :) >=20 > In the simplest case, a _changes feed could be implemented by scanning = the by_seq index of the shard that hosts the named partition. We already = get some efficiencies here: we don=E2=80=99t need to touch any of the = other shards of the database, and we have enough information in the = by_seq btree to filter out documents from other partitions without = actually retrieving them from disk, so we can push the filter down quite = nicely without a lot of extra processing. It=E2=80=99s just a very cheap = binary prefix pattern match on the docid. >=20 > Most consumers of the _changes feed work incrementally, and we can = support that here as well. It=E2=80=99s not like we need to do a full = table scan on every incremental request. >=20 > If the shard is hosting so many partitions that this filter is = becoming a bottleneck, resharding (also new in 3.0) is probably a good = option. Partitioned databases are particularly amenable to increasing = the shard count. Global indexes on the database become more expensive to = query, but those ought to be a smaller percentage of queries in this = data model. >=20 > Finally, if the overhead of filtering out non-matching partitions is = just too high, we could support the use of user-created indexes, e.g. by = having a user create a Mango index on _local_seq. If such an index = exists, our =E2=80=9Cquery planner=E2=80=9D uses it for the partitioned = _changes feed. If not, resort to the scan on the shard=E2=80=99s by_seq = index as above. >=20 > I=E2=80=99d like to do some basic benchmarking, but I have a feeling = the by_seq work quite well in the majority of cases, and the = user-defined index is a good "escape valve=E2=80=9D if we need it. WDYT? >=20 > Adam