Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 89914 invoked from network); 26 Jan 2011 16:38:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Jan 2011 16:38:26 -0000 Received: (qmail 17056 invoked by uid 500); 26 Jan 2011 16:38:24 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 16155 invoked by uid 500); 26 Jan 2011 16:38:21 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 16136 invoked by uid 99); 26 Jan 2011 16:38:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Jan 2011 16:38:20 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paul.joseph.davis@gmail.com designates 74.125.83.52 as permitted sender) Received: from [74.125.83.52] (HELO mail-gw0-f52.google.com) (74.125.83.52) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Jan 2011 16:38:14 +0000 Received: by gwb11 with SMTP id 11so221855gwb.11 for ; Wed, 26 Jan 2011 08:37:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=Ll9Vup+Ac2uvhS0NiuT6ctT8w/T51sHbVi/eJgyxAsc=; b=rk0hFEvljEy5MD1/rC0Pi+xwqBEfab/LrtRcCNJ1XNj8bOeH1Sek6up6yPvqhnv6PR edOO3oL29UsMyNMicbo5LGOGw/0FYxBqgJu/6DmB9IQ2bgKGK4A5iqcBltIlHxIiiW/q eLo7QuZd+8zmmWaMp/cC/C4Y+xa1fs5tL1j+M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=S32IHYDedQ8nWAWQoB5DNf8LFF1neFO44OHYqROMkM5wqscX2K3B4PjA4dufk3roOV qBtQbSKy5o4GZkJXq82beINRrj5iJ3ck5YNCA4g40rSMpjVPebvExFaRkZXTe4Og9hfy BOqyozzP7hDiHM+8CwBV054T77AmFryF8EqWk= Received: by 10.151.114.15 with SMTP id r15mr1422671ybm.242.1296059872731; Wed, 26 Jan 2011 08:37:52 -0800 (PST) MIME-Version: 1.0 Received: by 10.147.34.2 with HTTP; Wed, 26 Jan 2011 08:37:12 -0800 (PST) In-Reply-To: References: From: Paul Davis Date: Wed, 26 Jan 2011 11:37:12 -0500 Message-ID: Subject: Re: Next-generation attachment storage. To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Jan 26, 2011 at 10:34 AM, Robert Newson wrote: > Agree completely that commingled attachment files would not be an > appropriate default. However, managing a fixed number of very large > (e.g, 200 Gib) files full of attachment data would work well in a > hosted service. Obviously the code would have to be solid to prevent > the kind of data disclosure problems you mention. No, not just a default, I'm saying "not in the release tarball or in any method, shape, or form signaled as supported by Apache CouchDB". If hosting groups want to write and implement this I think that'd be just fine. > The haystack paper covers this btw. Each entry has a random cookie > value stored with it, you need to present the same value for the read > to succeed. The cookie could be stored in the #att record. Obviously > it still requires the code to verify the cookie and restrict the read > only to the bytes covered by that item, but that's a code quality > thing and should be easy enough to review. > The issue here is that I just assume that there will be a bug in the code that leaks information across databases. So the question is if we make the bet that we can prevent it from happening for the next 15 years until some whippersnapper db comes and replaces us. The reason I'd be against including multi-tenant files is that I see that as requiring the same amount of effort as if it were the only supported option. Its just not ok for db's to have the leakage as a possible failure condition IMO. There's also the part about information leakage using timing attacks and such forth that I don't see as surmountable. > B. > > On Wed, Jan 26, 2011 at 3:23 PM, Paul Davis wrote: >> On Wed, Jan 26, 2011 at 9:35 AM, Benoit Chesneau wrote: >>> On Wed, Jan 26, 2011 at 2:20 PM, Robert Newson wrote: >>>> All, >>>> >>>> Most of you know that I'm currently working on 'external attachments'. >>>> I've spent quite some time reading and modifying the current code and >>>> have tried several approaches to the problem. I've implemented one >>>> version fairly completely >>>> (https://github.com/rnewson/couchdb/tree/external_attachments) which >>>> places any attachment over a threshold (defaulting to 256 kb) into a >>>> separate file (and all files that are sent chunked). This branch works >>>> for PUT/GET/DELETE, local and remote replication and compaction. >>>> External attachments do not support compression or ranges yet. >>>> >>>> At this point, I'd like to get some feedback. I don't believe >>>> file-per-attachment is a solution that works for everyone but it was >>>> necessary to make a choice in order to understand how to integrate any >>>> kind of external attachment into couchdb. >>>> >>>> So, here's my real proposal for CouchDB 1.2 (or 2.0?); >>>> >>>> Attachments are stored contiguously in compound files following a >>>> simplified form of Haystack >>>> (http://www.facebook.com/note.php?note_id=76191543919). I won't >>>> describe Haystack in detail as the article covers it, and it's not >>>> exactly what we need (the indexes, for example, are pointless, given >>>> we have a database). The basic idea is we have a small number of files >>>> that we append to, the limit of concurrency being the number of files >>>> (i.e, we will not interleave attachments in these files). >>>> >>>> There are several consequences to this; >>>> >>>> Pro >>>> 1) we can remove the 4k blocking in .couch files. >>>> 2) .couch files are smaller, improving all i/o operations (especially >>>> compaction). >>> >>>> 3) we can use more efficient primitives (like sendfile) to fetch attachments. >>>> >>>> Con >>>> 1) haystack files need compaction (though this involves no seeking so >>>> should be far better than .couch compaction) >>>> 2) more file descriptors >>>> 3) .couch files are no longer self-contained (complicating backup >>>> schemes, migration) >>>> >>>> I had originally planned for each database to have exclusive access to >>>> N haystack files (N is configurable, of course) since this aids with >>>> backups. However, another compelling option is to have N haystack >>>> files for all databases. This reduces the number of file descriptors >>>> needed, but complicates backup (we'd probably have to write a tool to >>>> extract matching attachments). >>>> >>> >>> I would go for one file / db, so we could remove attachments in the >>> same time we delete a db. >>> >>> The CONS about that is that we can't share attachements between db if >>> their signatures are the same. Another way would be to maintain an >>> index of attachements / dbs so we could remove then if they don't >>> appear to any other db after one have been removed. >>> >>> >>> >>> >>>> I've rushed through that rather breezily, I apologize. I've been >>>> thinking about this for quite some time so I likely have answers to >>>> most questions on this. >>>> >>>> B. >>>> >>> >>> That's a good idea anyway. Also did you have a look in luwak from basho ? >>> https://github.com/basho/luwak >>> >>> I know that's the implementation is different but I like the idea to >>> reuse the db to put attachements / chunks. So we could imagine to >>> dispatch chunks as we do for docs on cluster solutions. We could also >>> imagine to handle metadatas. >>> >>> - benoit >>> >> >> Another bit that Bob2 didn't mention was the idea of making this a >> pluggable API so that we can have a couple implementations that are >> configurable. For instance, Benoit's idea for a single file of >> interleaved attachments or the haystack approach with multiple files >> that keep attachments in contiguous chunks. >> >> As to sharing attachments between db's, I would be hugely hugely >> against releasing that as part of an actual release as there are a >> *lot* of downsides in how that would open us up for bad failure >> conditions. Ie, things like sending attachments from different db's by >> accident or or what not. Also, in shared tenant situations it seems >> like it'd be a prime suspect for information leakage and such forth. >> But I digress. >> >