couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benoit Chesneau <>
Subject Re: Next-generation attachment storage.
Date Wed, 26 Jan 2011 14:35:29 GMT
On Wed, Jan 26, 2011 at 2:20 PM, Robert Newson <> wrote:
> All,
> Most of you know that I'm currently working on 'external attachments'.
> I've spent quite some time reading and modifying the current code and
> have tried several approaches to the problem. I've implemented one
> version fairly completely
> ( which
> places any attachment over a threshold (defaulting to 256 kb) into a
> separate file (and all files that are sent chunked). This branch works
> for PUT/GET/DELETE, local and remote replication and compaction.
> External attachments do not support compression or ranges yet.
> At this point, I'd like to get some feedback. I don't believe
> file-per-attachment is a solution that works for everyone but it was
> necessary to make a choice in order to understand how to integrate any
> kind of external attachment into couchdb.
> So, here's my real proposal for CouchDB 1.2 (or 2.0?);
> Attachments are stored contiguously in compound files following a
> simplified form of Haystack
> ( I won't
> describe Haystack in detail as the article covers it, and it's not
> exactly what we need (the indexes, for example, are pointless, given
> we have a database). The basic idea is we have a small number of files
> that we append to, the limit of concurrency being the number of files
> (i.e, we will not interleave attachments in these files).
> There are several consequences to this;
> Pro
> 1) we can remove the 4k blocking in .couch files.
> 2) .couch files are smaller, improving all i/o operations (especially
> compaction).

> 3) we can use more efficient primitives (like sendfile) to fetch attachments.
> Con
> 1) haystack files need compaction (though this involves no seeking so
> should be far better than .couch compaction)
> 2) more file descriptors
> 3) .couch files are no longer self-contained (complicating backup
> schemes, migration)
> I had originally planned for each database to have exclusive access to
> N haystack files (N is configurable, of course) since this aids with
> backups. However, another compelling option is to have N haystack
> files for all databases. This reduces the number of file descriptors
> needed, but complicates backup (we'd probably have to write a tool to
> extract matching attachments).

I would go for one file / db, so we could remove attachments in the
same time we delete a db.

The CONS about that is that we can't share attachements between db if
their signatures are the same. Another way would be to maintain an
index of attachements / dbs so we could remove then if they don't
appear to any other db after one have been removed.

> I've rushed through that rather breezily, I apologize. I've been
> thinking about this for quite some time so I likely have answers to
> most questions on this.
> B.

That's a good idea anyway. Also did you have a look in luwak from basho ?

I know that's the implementation is different but I like the idea to
reuse the db to put attachements / chunks. So we could imagine to
dispatch chunks as we do for docs on cluster solutions. We could also
imagine to handle metadatas.

- benoit

View raw message