couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <>
Subject Re: Next-generation attachment storage.
Date Wed, 26 Jan 2011 14:47:31 GMT
Luwak looks very interesting, thanks!

As I noted originally, the harder part of the work is integrated in
with couchdb and/or replacing the current attachment code entirely
(which is my preference), so I went with the simplest approach to
externalizing attachments (one attachment per file).

The issue of synchronizing the data between the two storage systems
needs some careful thought. My current approach is to put data into
the attachment store (whether haystack, luwak or custom) with a
'provisional' marker. After we write_and_commit, we go back and mark
it as final. We do something similar for removal ('provisionally
removed' -> 'removed'). This will allow us, in most circumstances, to
know the status of an item in the attachment store without
cross-referencing it with couchdb. This will be important when
compacting the attachment storage files (necessary in haystack, no
clue yet for luwak).


On Wed, Jan 26, 2011 at 2:35 PM, Benoit Chesneau <> wrote:
> On Wed, Jan 26, 2011 at 2:20 PM, Robert Newson <> wrote:
>> All,
>> Most of you know that I'm currently working on 'external attachments'.
>> I've spent quite some time reading and modifying the current code and
>> have tried several approaches to the problem. I've implemented one
>> version fairly completely
>> ( which
>> places any attachment over a threshold (defaulting to 256 kb) into a
>> separate file (and all files that are sent chunked). This branch works
>> for PUT/GET/DELETE, local and remote replication and compaction.
>> External attachments do not support compression or ranges yet.
>> At this point, I'd like to get some feedback. I don't believe
>> file-per-attachment is a solution that works for everyone but it was
>> necessary to make a choice in order to understand how to integrate any
>> kind of external attachment into couchdb.
>> So, here's my real proposal for CouchDB 1.2 (or 2.0?);
>> Attachments are stored contiguously in compound files following a
>> simplified form of Haystack
>> ( I won't
>> describe Haystack in detail as the article covers it, and it's not
>> exactly what we need (the indexes, for example, are pointless, given
>> we have a database). The basic idea is we have a small number of files
>> that we append to, the limit of concurrency being the number of files
>> (i.e, we will not interleave attachments in these files).
>> There are several consequences to this;
>> Pro
>> 1) we can remove the 4k blocking in .couch files.
>> 2) .couch files are smaller, improving all i/o operations (especially
>> compaction).
>> 3) we can use more efficient primitives (like sendfile) to fetch attachments.
>> Con
>> 1) haystack files need compaction (though this involves no seeking so
>> should be far better than .couch compaction)
>> 2) more file descriptors
>> 3) .couch files are no longer self-contained (complicating backup
>> schemes, migration)
>> I had originally planned for each database to have exclusive access to
>> N haystack files (N is configurable, of course) since this aids with
>> backups. However, another compelling option is to have N haystack
>> files for all databases. This reduces the number of file descriptors
>> needed, but complicates backup (we'd probably have to write a tool to
>> extract matching attachments).
> I would go for one file / db, so we could remove attachments in the
> same time we delete a db.
> The CONS about that is that we can't share attachements between db if
> their signatures are the same. Another way would be to maintain an
> index of attachements / dbs so we could remove then if they don't
> appear to any other db after one have been removed.
>> I've rushed through that rather breezily, I apologize. I've been
>> thinking about this for quite some time so I likely have answers to
>> most questions on this.
>> B.
> That's a good idea anyway. Also did you have a look in luwak from basho ?
> I know that's the implementation is different but I like the idea to
> reuse the db to put attachements / chunks. So we could imagine to
> dispatch chunks as we do for docs on cluster solutions. We could also
> imagine to handle metadatas.
> - benoit

View raw message