couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Pluggable Storage Engine API
Date Tue, 07 Jun 2016 21:43:01 GMT
Hi everyone!

So I've been working on designing and implementing a pluggable storage
engine API for CouchDB and I've finally gotten things far enough along
that I can start asking for feedback on the design.

The motivation for this work is two fold. First, the obvious benefit
is that developers can start working on using alternative storage
technologies to try and improve overall performance (ie, Riak's
pluggable storage backends [1]). Secondly, adding an internal storage
engine API should hopefully give us a better ability to separate
testing at functional levels (more on this below).

The majority of this work is on a branch [2] of the couchdb-couch
repository. There are four major commits related to this work:

   1. Define the API [3]
   2. Implement the legacy engine using this API [4]
   3. Add the implementation to use the API [5]
   4. Adding a reusable test suite [6]

I ended up taking this approach so that I could hopefully split up the
major tasks involved so that others would be able to review the
changes more easily. However it does result in a lot of copying of
code between the second and third commit as it involves moving a lot
of the legacy storage engine code around between commits so nothing is
broken after each commit. Hopefully that makes sense.

There are also branches on the main couchdb repo [7] (mostly for
pointing at the storage engine branches) as well as chttpd [8],
couch_index [9], couch_mrview [10], couch_replicator [11], fabric
[12], and mem3 [13]. For the most part these branches are relatively
minor changes. Generally speaking, things like the id_tree are no
longer part of the #db{} record and a couple function names were
changed for consistency in the API.

>From the end user point of view the feature works like such:

There is a config section [couch_db_engines] that lists the storage
engines that are currently available to be used [14]. Entires in this
section have a key that is a short name for the engine and a value
that is the main storage engine callback module. A default storage
engine can be set in the [couchdb] section [15] which defaults to the
legacy storage engine. When creating a database a user can also
specify an engine=short_name query string parameter where short_name
is a key in the [couch_db_engines] configuration section.

Currently once a storage engine is selected it can't be changed. I
would like to add this in the future but it wasn't something I
prioritized before getting this out and getting feedback.

After that the storage engine should be basically transparent to the
user. Internally we just detect which engine is responsible for a
given database by querying each engine and using the first one that
has the database available.

Other than that there's no outside change in behavior (which was
precisely the point). One of my goals was to make sure this didn't
turn into too much of a matrix game where some storage engines don't
support various features. Both because the fragmentation made me a bit
queasy as well as it just simplifies everything internally since we
don't have to add some sort of capabilities API and then feature
detect every random thing.

However, on that note about supporting features there are two issues
that I haven't completely solved to my liking. First, the attachments
API is something that makes some awfully specific assumptions on
behavior. There is an abstraction/API for the current scheme included
and it works, however I think we may want to think harder about this
and perhaps separate it out. I did make it an optional thing to
implement but that just happens by throwing an error so it wouldn't be
very pretty for users. Also storage engines that don't implement this
will end up being bad players in the replication game.

The only other hard corner in the API is the count_changes_since
function because it relies on a storage engine having the ability to
count rows between keys which is an uncommon feature (at least
uncommon to be implemented efficiently). This is only used for status
notifications currently so alternative engines can just give a
difference between the current update sequence as an approximation
though in pathological cases that could be off by orders of magnitude.

Lastly, I've also written an alternative storage engine couch_ngen
[16] that while fairly close to the legacy engine is different in a
couple ways. First, instead of a single file, it uses three files
(data, indexes, and commits) which was done to avoid doing the
make_blocks dance we do in couch_file. Secondly, it uses a NIF [17]
for all file IO (which uses dirty schedulers so it requires Erlang
17.something). And thirdly it only writes #full_doc_info{} records
once and stores a disk pointer in both the id_tree and seq_tree. Some
rough micro benchmarks have shown couch_ngen to be significantly
faster than the legacy engine but that's only been tested at the
storage API level.

Also, for test ability I mentioned above, this includes a fairly
thorough test suite for storage engines which can be re-used across
all implementations so that we can make sure that storage engines are
all providing the same behavior. There are still some questions here
over some specific behaviors (ie, read-only snapshots) that I haven't
been able to decide on enforcing. This is because the more we require
behavior wise the more we'd end up just spec'ing out exactly our
append only b+tree which is roughly the antithesis of what I was
hoping to accomplish.

For those of you interested in poking around and trying to kick the
tires, here's the minimum number of steps to get up and running:

    $ git clone
    $ cd couchdb
    $ git checkout -b 45918-pluggable-storage-engines
    $ ./configure --disable-fauxton --disable-docs
    $ make
    $ ./dev/run

And a quick demo that isn't very interesting:

    $ curl -X PUT
    $ curl
        "compact_running": false,
        "db_name": "foo",
        "disk_format_version": 6,
        "doc_count": 0,
        "doc_del_count": 0,
        "instance_start_time": "0",
        "purge_seq": 0,
        "sizes": {
            "active": 0,
            "external": 0,
            "file": 104
    $ curl -X PUT -d '{}'
    $ curl
    $ find ./dev/lib/ -name "*.ngen"

So that's basically that. I'd appreciate any feedback on the API
(couch_db_engine.erl) and any thoughts in general. So far this seems
to have turned out a lot cleaner than I expected other than the
attachments and couch_changes_since questions.

Also, I should note that obviously this isn't a candidate for 2.0
given that its a big change and we have the feature freeze. So there's
plenty of time for people to take their time looking into this. Now
that I've got this out for review I'll be trying to focus my dev time
on the blockers list.

Any and all feedback appreciated.



View raw message