From dev-return-49367-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Thu May 14 21:11:23 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 21B9C180621 for ; Thu, 14 May 2020 23:11:23 +0200 (CEST) Received: (qmail 23101 invoked by uid 500); 14 May 2020 21:11:22 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 23084 invoked by uid 99); 14 May 2020 21:11:22 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 May 2020 21:11:22 +0000 Received: from [192.168.169.22] (toroon0560w-lp130-13-70-26-27-87.dsl.bell.ca [70.26.27.87]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id D805C8266 for ; Thu, 14 May 2020 21:11:21 +0000 (UTC) Subject: Re: [DISCUSS] _changes feed on database partitions To: dev@couchdb.apache.org References: <3F63DB09-6CB6-41E1-8C2A-1EAE3406CE36@apache.org> From: Joan Touzet Organization: Apache Software Foundation Message-ID: <8a5964c2-bd9e-fd82-1514-d5e8db99f170@apache.org> Date: Thu, 14 May 2020 17:11:20 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <3F63DB09-6CB6-41E1-8C2A-1EAE3406CE36@apache.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit On 2020-05-13 10:07 a.m., Robert Samuel Newson wrote: > Hi, > > Yes, I think this would be a good addition for 3.0. I think we didn't add it before because of concerns of accidental misuse (attempting to replicate with it but forgetting a range, etc)? This was definitely a concern, but it makes me wonder: are we also potentially discussing adding per-shard _changes feeds? I wonder if the work is roughly analogous. > Whatever the reasons, I think exposing the per-partition _changes feed exactly as you've described will be useful. We should state explicitly in the accompanying docs that the replicator does not use this endpoint (though, of course, it might be enhanced to do so in a future release). > > From 4.0 onward, there's a discussion elsewhere on whether any of the _partition endpoints continue to exist (leaning towards keeping them just to avoid unnecessary upgrade pain?), so a note in that thread would be good too. It does seem odd to enhance an endpoint in 3.0 to then remove it entirely in 4.0. The reasons for removing _partition are compelling however, as the motivating (internal) reason for introducing _partition is gone. > > B. > >> On 12 May 2020, at 22:59, Adam Kocoloski wrote: >> >> Hi all, >> >> When we introduced partitioned databases in 3.0 we declined to add a partition-specific _changes endpoint, because we didn’t have a prebuilt index that could support it. It sounds like the lack of that endpoint is a bit of a drag. I wanted to start this thread to consider adding it. >> >> Note: this isn’t a fully-formed proposal coming from my team with a plan to staff the development of it. Just a discussion :) >> >> In the simplest case, a _changes feed could be implemented by scanning the by_seq index of the shard that hosts the named partition. We already get some efficiencies here: we don’t need to touch any of the other shards of the database, and we have enough information in the by_seq btree to filter out documents from other partitions without actually retrieving them from disk, so we can push the filter down quite nicely without a lot of extra processing. It’s just a very cheap binary prefix pattern match on the docid. >> >> Most consumers of the _changes feed work incrementally, and we can support that here as well. It’s not like we need to do a full table scan on every incremental request. >> >> If the shard is hosting so many partitions that this filter is becoming a bottleneck, resharding (also new in 3.0) is probably a good option. Partitioned databases are particularly amenable to increasing the shard count. Global indexes on the database become more expensive to query, but those ought to be a smaller percentage of queries in this data model. >> >> Finally, if the overhead of filtering out non-matching partitions is just too high, we could support the use of user-created indexes, e.g. by having a user create a Mango index on _local_seq. If such an index exists, our “query planner” uses it for the partitioned _changes feed. If not, resort to the scan on the shard’s by_seq index as above. >> >> I’d like to do some basic benchmarking, but I have a feeling the by_seq work quite well in the majority of cases, and the user-defined index is a good "escape valve” if we need it. WDYT? >> >> Adam >