couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Next-generation attachment storage.
Date Wed, 26 Jan 2011 16:37:12 GMT
On Wed, Jan 26, 2011 at 10:34 AM, Robert Newson <> wrote:
> Agree completely that commingled attachment files would not be an
> appropriate default. However, managing a fixed number of very large
> (e.g, 200 Gib) files full of attachment data would work well in a
> hosted service. Obviously the code would have to be solid to prevent
> the kind of data disclosure problems you mention.

No, not just a default, I'm saying "not in the release tarball or in
any method, shape, or form signaled as supported by Apache CouchDB".
If hosting groups want to write and implement this I think that'd be
just fine.

> The haystack paper covers this btw. Each entry has a random cookie
> value stored with it, you need to present the same value for the read
> to succeed. The cookie could be stored in the #att record. Obviously
> it still requires the code to verify the cookie and restrict the read
> only to the bytes covered by that item, but that's a code quality
> thing and should be easy enough to review.

The issue here is that I just assume that there will be a bug in the
code that leaks information across databases. So the question is if we
make the bet that we can prevent it from happening for the next 15
years until some whippersnapper db comes and replaces us. The reason
I'd be against including multi-tenant files is that I see that as
requiring the same amount of effort as if it were the only supported
option. Its just not ok for db's to have the leakage as a possible
failure condition IMO.

There's also the part about information leakage using timing attacks
and such forth that I don't see as surmountable.

> B.
> On Wed, Jan 26, 2011 at 3:23 PM, Paul Davis <> wrote:
>> On Wed, Jan 26, 2011 at 9:35 AM, Benoit Chesneau <> wrote:
>>> On Wed, Jan 26, 2011 at 2:20 PM, Robert Newson <>
>>>> All,
>>>> Most of you know that I'm currently working on 'external attachments'.
>>>> I've spent quite some time reading and modifying the current code and
>>>> have tried several approaches to the problem. I've implemented one
>>>> version fairly completely
>>>> ( which
>>>> places any attachment over a threshold (defaulting to 256 kb) into a
>>>> separate file (and all files that are sent chunked). This branch works
>>>> for PUT/GET/DELETE, local and remote replication and compaction.
>>>> External attachments do not support compression or ranges yet.
>>>> At this point, I'd like to get some feedback. I don't believe
>>>> file-per-attachment is a solution that works for everyone but it was
>>>> necessary to make a choice in order to understand how to integrate any
>>>> kind of external attachment into couchdb.
>>>> So, here's my real proposal for CouchDB 1.2 (or 2.0?);
>>>> Attachments are stored contiguously in compound files following a
>>>> simplified form of Haystack
>>>> ( I won't
>>>> describe Haystack in detail as the article covers it, and it's not
>>>> exactly what we need (the indexes, for example, are pointless, given
>>>> we have a database). The basic idea is we have a small number of files
>>>> that we append to, the limit of concurrency being the number of files
>>>> (i.e, we will not interleave attachments in these files).
>>>> There are several consequences to this;
>>>> Pro
>>>> 1) we can remove the 4k blocking in .couch files.
>>>> 2) .couch files are smaller, improving all i/o operations (especially
>>>> compaction).
>>>> 3) we can use more efficient primitives (like sendfile) to fetch attachments.
>>>> Con
>>>> 1) haystack files need compaction (though this involves no seeking so
>>>> should be far better than .couch compaction)
>>>> 2) more file descriptors
>>>> 3) .couch files are no longer self-contained (complicating backup
>>>> schemes, migration)
>>>> I had originally planned for each database to have exclusive access to
>>>> N haystack files (N is configurable, of course) since this aids with
>>>> backups. However, another compelling option is to have N haystack
>>>> files for all databases. This reduces the number of file descriptors
>>>> needed, but complicates backup (we'd probably have to write a tool to
>>>> extract matching attachments).
>>> I would go for one file / db, so we could remove attachments in the
>>> same time we delete a db.
>>> The CONS about that is that we can't share attachements between db if
>>> their signatures are the same. Another way would be to maintain an
>>> index of attachements / dbs so we could remove then if they don't
>>> appear to any other db after one have been removed.
>>>> I've rushed through that rather breezily, I apologize. I've been
>>>> thinking about this for quite some time so I likely have answers to
>>>> most questions on this.
>>>> B.
>>> That's a good idea anyway. Also did you have a look in luwak from basho ?
>>> I know that's the implementation is different but I like the idea to
>>> reuse the db to put attachements / chunks. So we could imagine to
>>> dispatch chunks as we do for docs on cluster solutions. We could also
>>> imagine to handle metadatas.
>>> - benoit
>> Another bit that Bob2 didn't mention was the idea of making this a
>> pluggable API so that we can have a couple implementations that are
>> configurable. For instance, Benoit's idea for a single file of
>> interleaved attachments or the haystack approach with multiple files
>> that keep attachments in contiguous chunks.
>> As to sharing attachments between db's, I would be hugely hugely
>> against releasing that as part of an actual release as there are a
>> *lot* of downsides in how that would open us up for bad failure
>> conditions. Ie, things like sending attachments from different db's by
>> accident or or what not. Also, in shared tenant situations it seems
>> like it'd be a prime suspect for information leakage and such forth.
>> But I digress.

View raw message