couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject [DISCUSS] _changes feed on database partitions
Date Tue, 12 May 2020 21:59:16 GMT
Hi all,

When we introduced partitioned databases in 3.0 we declined to add a partition-specific _changes
endpoint, because we didn’t have a prebuilt index that could support it. It sounds like
the lack of that endpoint is a bit of a drag. I wanted to start this thread to consider adding
it.

Note: this isn’t a fully-formed proposal coming from my team with a plan to staff the development
of it. Just a discussion :)

In the simplest case, a _changes feed could be implemented by scanning the by_seq index of
the shard that hosts the named partition. We already get some efficiencies here: we don’t
need to touch any of the other shards of the database, and we have enough information in the
by_seq btree to filter out documents from other partitions without actually retrieving them
from disk, so we can push the filter down quite nicely without a lot of extra processing.
It’s just a very cheap binary prefix pattern match on the docid.

Most consumers of the _changes feed work incrementally, and we can support that here as well.
It’s not like we need to do a full table scan on every incremental request.

If the shard is hosting so many partitions that this filter is becoming a bottleneck, resharding
(also new in 3.0) is probably a good option. Partitioned databases are particularly amenable
to increasing the shard count. Global indexes on the database become more expensive to query,
but those ought to be a smaller percentage of queries in this data model.

Finally, if the overhead of filtering out non-matching partitions is just too high, we could
support the use of user-created indexes, e.g. by having a user create a Mango index on _local_seq.
If such an index exists, our “query planner” uses it for the partitioned _changes feed.
If not, resort to the scan on the shard’s by_seq index as above.

I’d like to do some basic benchmarking, but I have a feeling the by_seq work quite well
in the majority of cases, and the user-defined index is a good "escape valve” if we need
it. WDYT?

Adam
Mime
View raw message