couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject Re: Not-even-yet-newbie question
Date Sun, 19 Apr 2009 17:32:21 GMT
On Sun, Apr 19, 2009 at 09:35:40AM -0400, Jan Lehnardt wrote:
>> Does this really mean that if we have, for one customer, 100,000  
>> CouchDB "documents" consisting each of minimal meta-data, but with  
>> each an attachment that is on average a 3 MB PDF file, that this is  
>> all stored in one single 300 GB file ?
>>
>> 1a.
>> If yes, is that not uncomfortable/scary ?
>
> Yes, this is correct.
>
>> (I mean, even nowadays, moving a 300 GB file is not the easiest  
>> practical thing to do).

It need some testing. Last time I tried rsync with a 2GB file it didn't
handle it very well, but that was several years ago. Since the file is
append-only (apart from the first 4KB changing) in principle it should be
possible to copy it incrementally fairly easily, writing a small tool to do
so if necessary.

> Yes, if you get this use-case, you might not only want a DB per user but 
> a DB per user per day.

However at that point you will lose the ability to have a view which indexes
all the customer's documents. That is, after 365 days you would need to do
365 queries just to locate a document by its metadata. I wouldn't recommend
that.

As others have suggested: you could if you prefer just store a pointer (e.g.
filename) to where the file is stored on some other filesystem. Maybe store
them by SHA1, where the filename is xx/xx/xxxxxxxxxxxxxxxx

As an extension of this idea, you could have 257 couchdb filesystems: one
for the indexes of metadata, and the others each storing 1/256th of the SHA1
space. Probably doesn't buy you much over just using the filesystem natively
though.

Regards,

Brian.

Mime
View raw message