incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filipe David Manana <>
Subject Re: same attachment across documents / databases
Date Fri, 28 Oct 2011 11:31:19 GMT
You can also use a filesystem that does block level deduplication on
the fly (there's also dedicated hardware for that).
Example filesystem:

There are of course the tradeoffs, like speed vs space saving gains for example.

On Fri, Oct 28, 2011 at 12:25 PM, Robert Newson <> wrote:
> The approach would be to teach couchdb how to deduplicate
> byte-identical attachments (or chunks thereof) with a file. Sounds a
> bit tricky but not impossible.
> B.
> On 28 October 2011 12:22, Gregor Martynus <> wrote:
>> Thanks for your responses!
>> I'm not sure if there is any approach to go minimize the disadvantage of
>> replicated attachments eating up space and performance, if there is, please
>> let me know.
>> My approach would be to setup a backend server that listens to new
>> attachments coming in, transferring these to an external store like S3 and
>> then replace the doc attachment in the DB with some kind of pointer to the
>> new location of the attachments.
>> Not sure if that makes sense, I'm open for suggestions.
>> And once more thanks for your help!
>> On Fri, Oct 28, 2011 at 1:14 PM, CGS <> wrote:
>>> Hi Gregor,
>>> I might be wrong because I am no expert in that field. But from the
>>> documentation, one can deduce that all the attachments are inserted into the
>>> document and not pointing toward a physical file (quite logic if you
>>> consider the main purpose of CouchDB: web-oriented database). As replication
>>> mechanism is the same for local replication and replication over the network
>>> (just transferring the content of data from source file to the target file),
>>> my guess is that your attachment is copied in all the physical files for
>>> which a replication operation was applied.
>>> However, depending on your project requests, instead of attachment you can
>>> use a pointer which you can use it in shows (at the user's end). The
>>> limitations of such a method are imposed by the cross-domain limitations (if
>>> you use AJAX).
>>> I hope this answer will help you in designing your project and if somebody
>>> notice any mistake in my answer, please, correct me.
>>> Cheers,
>>> CGS
>>> On 10/28/2011 12:32 PM, Gregor Martynus wrote:
>>>> I wonder how couchDB stores document attachments internally. In
>>>> particular,
>>>> I'd like to know if I replicate a document with attachments from one
>>>> database to another, will the attachments be stored twice internally or
>>>> will
>>>> the couchDB be smart enough to understand that the attachment does already
>>>> exist and only needs to link to it?
>>>> I hope my question is clear. In my case, each account has an own database
>>>> with its own documents. Now documents can be shared between accounts which
>>>> will be done using replication. But when attachments would get stored
>>>> multiple times although they are exactly the same I fear that it would use
>>>> up too much space and eventually slow down replications etc?

Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

View raw message