couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject [DISCUSS] Implementing _all_docs on FoundationDB
Date Thu, 21 Mar 2019 19:50:37 GMT
Hi all, me again. This one will be shorter :) As I see it we have three different options for
serving the _all_docs endpoint from FDB: 

## Option 1: Read the document data, discard the bodies

We likely will have the documents stored in docid order already; we could do range reads and
discard everything but the ID and _rev by default. This can be a very efficient implementation
of include_docs=true (though one needs to be careful about skipping the conflict bodies),
but pretty wasteful otherwise.

## Option 2: Read the “revisions” subspace

We also have an entry for every document in ID order in the “revisions” subspace. The
disadvantage of this approach is that every deleted edit branch shows up there, too, and some
databases will have lots of deleted documents. We may need to build skiplists to know how
to scan efficiently. This subspace is also doing a lot of heavy lifting for us already, and
if we wanted to toy with alternative revision history representations in the future it could
get complicated

## Option 3: Add specific entries to support _all_docs

We can also write an extra KV containing the ID and winning _rev in a special subspace just
to support this endpoint. It would be a blind write because we’re already coordinating concurrent
transactions through reads on the “revisions” subspace. This would be conceptually quite
clean and simple, and the fastest implementation for constructing the default response.

===

My sense is Option 2 is a non-starter but I include it for completeness in case anyone else
thought of the same. I think Option 3 is a reasonable space / efficiency / simplicity tradeoff,
and it might also be worth testing out Option 1 as an optimized implementation for include_docs=true.

Thoughts? I imagine we can move quickly to an RFC for at least having the extra KVs for Option
3, and in that design also acknowledge the option for scanning the docs space directly to
support include_docs.

Adam
Mime
View raw message