Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6F71E7F86 for ; Fri, 28 Oct 2011 11:38:29 +0000 (UTC) Received: (qmail 39537 invoked by uid 500); 28 Oct 2011 11:38:28 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 39432 invoked by uid 500); 28 Oct 2011 11:38:27 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 39424 invoked by uid 99); 28 Oct 2011 11:38:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Oct 2011 11:38:27 +0000 X-ASF-Spam-Status: No, hits=0.4 required=5.0 tests=FREEMAIL_FROM,FROM_LOCAL_NOVOWEL,HK_RANDOM_ENVFROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cgsmcmlxxv@gmail.com designates 209.85.215.180 as permitted sender) Received: from [209.85.215.180] (HELO mail-ey0-f180.google.com) (209.85.215.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Oct 2011 11:38:21 +0000 Received: by eyg5 with SMTP id 5so4379195eyg.11 for ; Fri, 28 Oct 2011 04:38:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=xva8x/ml6egLYBnzGbNvCEktp4i+sTPA6nJb7SW0h2U=; b=W8MViyQsiweyT5CS0MQBXUiyrfKetegHdJXbVsBGz2uQ+VlptYCsfSbw2M/av1Z0LQ PklGcQpZ0kid1qE90Lty7QAdckDj61t5jqy/pKQHsnOeAw0V186o/kMwJ7X5p+CL3zHA uc0FXzwlxI3VZYSdoXvjywYC7mlT0q6hii77Y= Received: by 10.213.28.207 with SMTP id n15mr270031ebc.93.1319801880292; Fri, 28 Oct 2011 04:38:00 -0700 (PDT) Received: from [192.168.1.100] (dynamic-78-8-1-196.ssp.dialog.net.pl. [78.8.1.196]) by mx.google.com with ESMTPS id y11sm23821200eej.5.2011.10.28.04.37.57 (version=SSLv3 cipher=OTHER); Fri, 28 Oct 2011 04:37:58 -0700 (PDT) Message-ID: <4EAA9414.8010908@gmail.com> Date: Fri, 28 Oct 2011 13:37:56 +0200 From: CGS User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15 MIME-Version: 1.0 To: user@couchdb.apache.org Subject: Re: same attachment across documents / databases References: <4EAA8E96.8030405@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Gregor, your approach makes perfect sense, only that you need some work to do because: 1. the attachments are encoded in CouchDB; 2. you will need a document scanner. I don't know about your project, but I would go with attachments on a different pipe managed by a web server and pointed in CouchDB documents (with maximum a document per attachment to manage the attachment description and use of include_docs). Now, it's up to you because you know better your project requests. CGS On 10/28/2011 01:25 PM, Robert Newson wrote: > The approach would be to teach couchdb how to deduplicate > byte-identical attachments (or chunks thereof) with a file. Sounds a > bit tricky but not impossible. > > B. > > On 28 October 2011 12:22, Gregor Martynus wrote: >> Thanks for your responses! >> >> I'm not sure if there is any approach to go minimize the disadvantage of >> replicated attachments eating up space and performance, if there is, please >> let me know. >> >> My approach would be to setup a backend server that listens to new >> attachments coming in, transferring these to an external store like S3 and >> then replace the doc attachment in the DB with some kind of pointer to the >> new location of the attachments. >> >> Not sure if that makes sense, I'm open for suggestions. >> >> And once more thanks for your help! >> >> On Fri, Oct 28, 2011 at 1:14 PM, CGS wrote: >> >>> Hi Gregor, >>> >>> I might be wrong because I am no expert in that field. But from the >>> documentation, one can deduce that all the attachments are inserted into the >>> document and not pointing toward a physical file (quite logic if you >>> consider the main purpose of CouchDB: web-oriented database). As replication >>> mechanism is the same for local replication and replication over the network >>> (just transferring the content of data from source file to the target file), >>> my guess is that your attachment is copied in all the physical files for >>> which a replication operation was applied. >>> >>> However, depending on your project requests, instead of attachment you can >>> use a pointer which you can use it in shows (at the user's end). The >>> limitations of such a method are imposed by the cross-domain limitations (if >>> you use AJAX). >>> >>> I hope this answer will help you in designing your project and if somebody >>> notice any mistake in my answer, please, correct me. >>> >>> Cheers, >>> CGS >>> >>> >>> >>> >>> On 10/28/2011 12:32 PM, Gregor Martynus wrote: >>> >>>> I wonder how couchDB stores document attachments internally. In >>>> particular, >>>> I'd like to know if I replicate a document with attachments from one >>>> database to another, will the attachments be stored twice internally or >>>> will >>>> the couchDB be smart enough to understand that the attachment does already >>>> exist and only needs to link to it? >>>> >>>> I hope my question is clear. In my case, each account has an own database >>>> with its own documents. Now documents can be shared between accounts which >>>> will be done using replication. But when attachments would get stored >>>> multiple times although they are exactly the same I fear that it would use >>>> up too much space and eventually slow down replications etc? >>>> >>>>