couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <>
Subject Re: Next-generation attachment storage.
Date Wed, 26 Jan 2011 15:34:52 GMT
Agree completely that commingled attachment files would not be an
appropriate default. However, managing a fixed number of very large
(e.g, 200 Gib) files full of attachment data would work well in a
hosted service. Obviously the code would have to be solid to prevent
the kind of data disclosure problems you mention.

The haystack paper covers this btw. Each entry has a random cookie
value stored with it, you need to present the same value for the read
to succeed. The cookie could be stored in the #att record. Obviously
it still requires the code to verify the cookie and restrict the read
only to the bytes covered by that item, but that's a code quality
thing and should be easy enough to review.


On Wed, Jan 26, 2011 at 3:23 PM, Paul Davis <> wrote:
> On Wed, Jan 26, 2011 at 9:35 AM, Benoit Chesneau <> wrote:
>> On Wed, Jan 26, 2011 at 2:20 PM, Robert Newson <> wrote:
>>> All,
>>> Most of you know that I'm currently working on 'external attachments'.
>>> I've spent quite some time reading and modifying the current code and
>>> have tried several approaches to the problem. I've implemented one
>>> version fairly completely
>>> ( which
>>> places any attachment over a threshold (defaulting to 256 kb) into a
>>> separate file (and all files that are sent chunked). This branch works
>>> for PUT/GET/DELETE, local and remote replication and compaction.
>>> External attachments do not support compression or ranges yet.
>>> At this point, I'd like to get some feedback. I don't believe
>>> file-per-attachment is a solution that works for everyone but it was
>>> necessary to make a choice in order to understand how to integrate any
>>> kind of external attachment into couchdb.
>>> So, here's my real proposal for CouchDB 1.2 (or 2.0?);
>>> Attachments are stored contiguously in compound files following a
>>> simplified form of Haystack
>>> ( I won't
>>> describe Haystack in detail as the article covers it, and it's not
>>> exactly what we need (the indexes, for example, are pointless, given
>>> we have a database). The basic idea is we have a small number of files
>>> that we append to, the limit of concurrency being the number of files
>>> (i.e, we will not interleave attachments in these files).
>>> There are several consequences to this;
>>> Pro
>>> 1) we can remove the 4k blocking in .couch files.
>>> 2) .couch files are smaller, improving all i/o operations (especially
>>> compaction).
>>> 3) we can use more efficient primitives (like sendfile) to fetch attachments.
>>> Con
>>> 1) haystack files need compaction (though this involves no seeking so
>>> should be far better than .couch compaction)
>>> 2) more file descriptors
>>> 3) .couch files are no longer self-contained (complicating backup
>>> schemes, migration)
>>> I had originally planned for each database to have exclusive access to
>>> N haystack files (N is configurable, of course) since this aids with
>>> backups. However, another compelling option is to have N haystack
>>> files for all databases. This reduces the number of file descriptors
>>> needed, but complicates backup (we'd probably have to write a tool to
>>> extract matching attachments).
>> I would go for one file / db, so we could remove attachments in the
>> same time we delete a db.
>> The CONS about that is that we can't share attachements between db if
>> their signatures are the same. Another way would be to maintain an
>> index of attachements / dbs so we could remove then if they don't
>> appear to any other db after one have been removed.
>>> I've rushed through that rather breezily, I apologize. I've been
>>> thinking about this for quite some time so I likely have answers to
>>> most questions on this.
>>> B.
>> That's a good idea anyway. Also did you have a look in luwak from basho ?
>> I know that's the implementation is different but I like the idea to
>> reuse the db to put attachements / chunks. So we could imagine to
>> dispatch chunks as we do for docs on cluster solutions. We could also
>> imagine to handle metadatas.
>> - benoit
> Another bit that Bob2 didn't mention was the idea of making this a
> pluggable API so that we can have a couple implementations that are
> configurable. For instance, Benoit's idea for a single file of
> interleaved attachments or the haystack approach with multiple files
> that keep attachments in contiguous chunks.
> As to sharing attachments between db's, I would be hugely hugely
> against releasing that as part of an actual release as there are a
> *lot* of downsides in how that would open us up for bad failure
> conditions. Ie, things like sending attachments from different db's by
> accident or or what not. Also, in shared tenant situations it seems
> like it'd be a prime suspect for information leakage and such forth.
> But I digress.

View raw message