Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 287D07E33 for ; Fri, 28 Oct 2011 13:08:40 +0000 (UTC) Received: (qmail 71234 invoked by uid 500); 28 Oct 2011 13:08:38 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 71201 invoked by uid 500); 28 Oct 2011 13:08:38 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 71193 invoked by uid 99); 28 Oct 2011 13:08:38 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Oct 2011 13:08:38 +0000 Received: from localhost (HELO mail-iy0-f180.google.com) (127.0.0.1) (smtp-auth username rnewson, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Oct 2011 13:08:38 +0000 Received: by iakc1 with SMTP id c1so6339113iak.11 for ; Fri, 28 Oct 2011 06:08:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.65.73 with SMTP id h9mr1026844ibi.21.1319807317580; Fri, 28 Oct 2011 06:08:37 -0700 (PDT) Received: by 10.231.146.76 with HTTP; Fri, 28 Oct 2011 06:08:37 -0700 (PDT) In-Reply-To: References: <4EAA8E96.8030405@gmail.com> Date: Fri, 28 Oct 2011 14:08:37 +0100 Message-ID: Subject: Re: same attachment across documents / databases From: Robert Newson To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 To be a solid implementation the reference counting would need to happen in the core database layer, I think. It's the same as hard links in filesystems. B. On 28 October 2011 13:31, Benoit Chesneau wrote: > On Fri, Oct 28, 2011 at 1:25 PM, Robert Newson wrote: >> The approach would be to teach couchdb how to deduplicate >> byte-identical attachments (or chunks thereof) with a file. Sounds a >> bit tricky but not impossible. >> >> B. > > Other way would be saving attachments in one place and check their > signatiure to detect duplication. At least per db it could work, > couldn't it? > > - benoit > >> >> On 28 October 2011 12:22, Gregor Martynus wrote: >>> Thanks for your responses! >>> >>> I'm not sure if there is any approach to go minimize the disadvantage of >>> replicated attachments eating up space and performance, if there is, please >>> let me know. >>> >>> My approach would be to setup a backend server that listens to new >>> attachments coming in, transferring these to an external store like S3 and >>> then replace the doc attachment in the DB with some kind of pointer to the >>> new location of the attachments. >>> >>> Not sure if that makes sense, I'm open for suggestions. >>> >>> And once more thanks for your help! >>> >>> On Fri, Oct 28, 2011 at 1:14 PM, CGS wrote: >>> >>>> Hi Gregor, >>>> >>>> I might be wrong because I am no expert in that field. But from the >>>> documentation, one can deduce that all the attachments are inserted into the >>>> document and not pointing toward a physical file (quite logic if you >>>> consider the main purpose of CouchDB: web-oriented database). As replication >>>> mechanism is the same for local replication and replication over the network >>>> (just transferring the content of data from source file to the target file), >>>> my guess is that your attachment is copied in all the physical files for >>>> which a replication operation was applied. >>>> >>>> However, depending on your project requests, instead of attachment you can >>>> use a pointer which you can use it in shows (at the user's end). The >>>> limitations of such a method are imposed by the cross-domain limitations (if >>>> you use AJAX). >>>> >>>> I hope this answer will help you in designing your project and if somebody >>>> notice any mistake in my answer, please, correct me. >>>> >>>> Cheers, >>>> CGS >>>> >>>> >>>> >>>> >>>> On 10/28/2011 12:32 PM, Gregor Martynus wrote: >>>> >>>>> I wonder how couchDB stores document attachments internally. In >>>>> particular, >>>>> I'd like to know if I replicate a document with attachments from one >>>>> database to another, will the attachments be stored twice internally or >>>>> will >>>>> the couchDB be smart enough to understand that the attachment does already >>>>> exist and only needs to link to it? >>>>> >>>>> I hope my question is clear. In my case, each account has an own database >>>>> with its own documents. Now documents can be shared between accounts which >>>>> will be done using replication. But when attachments would get stored >>>>> multiple times although they are exactly the same I fear that it would use >>>>> up too much space and eventually slow down replications etc? >>>>> >>>>> >>>> >>> >> >