couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joan Touzet <woh...@apache.org>
Subject Re: [DISCUSS] _changes feed on database partitions
Date Thu, 14 May 2020 21:11:20 GMT


On 2020-05-13 10:07 a.m., Robert Samuel Newson wrote:
> Hi,
> 
> Yes, I think this would be a good addition for 3.0. I think we didn't add it before because
of concerns of accidental misuse (attempting to replicate with it but forgetting a range,
etc)?

This was definitely a concern, but it makes me wonder: are we also 
potentially discussing adding per-shard _changes feeds? I wonder if the 
work is roughly analogous.

> Whatever the reasons, I think exposing the per-partition _changes feed exactly as you've
described will be useful. We should state explicitly in the accompanying docs that the replicator
does not use this endpoint (though, of course, it might be enhanced to do so in a future release).
> 
>  From 4.0 onward, there's a discussion elsewhere on whether any of the _partition endpoints
continue to exist (leaning towards keeping them just to avoid unnecessary upgrade pain?),
so a note in that thread would be good too. It does seem odd to enhance an endpoint in 3.0
to then remove it entirely in 4.0. The reasons for removing _partition are compelling however,
as the motivating (internal) reason for introducing _partition is gone.
> 
> B.
> 
>> On 12 May 2020, at 22:59, Adam Kocoloski <kocolosk@apache.org> wrote:
>>
>> Hi all,
>>
>> When we introduced partitioned databases in 3.0 we declined to add a partition-specific
_changes endpoint, because we didn’t have a prebuilt index that could support it. It sounds
like the lack of that endpoint is a bit of a drag. I wanted to start this thread to consider
adding it.
>>
>> Note: this isn’t a fully-formed proposal coming from my team with a plan to staff
the development of it. Just a discussion :)
>>
>> In the simplest case, a _changes feed could be implemented by scanning the by_seq
index of the shard that hosts the named partition. We already get some efficiencies here:
we don’t need to touch any of the other shards of the database, and we have enough information
in the by_seq btree to filter out documents from other partitions without actually retrieving
them from disk, so we can push the filter down quite nicely without a lot of extra processing.
It’s just a very cheap binary prefix pattern match on the docid.
>>
>> Most consumers of the _changes feed work incrementally, and we can support that here
as well. It’s not like we need to do a full table scan on every incremental request.
>>
>> If the shard is hosting so many partitions that this filter is becoming a bottleneck,
resharding (also new in 3.0) is probably a good option. Partitioned databases are particularly
amenable to increasing the shard count. Global indexes on the database become more expensive
to query, but those ought to be a smaller percentage of queries in this data model.
>>
>> Finally, if the overhead of filtering out non-matching partitions is just too high,
we could support the use of user-created indexes, e.g. by having a user create a Mango index
on _local_seq. If such an index exists, our “query planner” uses it for the partitioned
_changes feed. If not, resort to the scan on the shard’s by_seq index as above.
>>
>> I’d like to do some basic benchmarking, but I have a feeling the by_seq work quite
well in the majority of cases, and the user-defined index is a good "escape valve” if we
need it. WDYT?
>>
>> Adam
> 

Mime
View raw message