couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <>
Subject Re: how much stuff can one stuff into a CouchDB?
Date Wed, 29 Apr 2009 22:22:36 GMT
I think the total size of the data isn't a problem, but each access  
will require loading up the whole document into memory and 25 megs of  
data to read or update a small bit data, which can be quite  
inefficient. Views will be slow to update for each document change  
too, as it needs to load the whole thing into memory and serialize to  
the view engine, etc.

If the documents can be broken up into smaller updatable units, then  
things will work more smoothly generally, but still somewhat slow when  
building views. If the data can be stored as binary attachment, with  
just some meta data about the files stored in the JSON, views will be  
much more efficient.


On Apr 29, 2009, at 3:11 PM, James Marca wrote:

> Hi All,
> On the Wiki, the FAQ says:
> Q: How Much Stuff can I Store in CouchDB?
> A: With node partitioning, virtually unlimited. For a single database
>   instance, the practical scaling limits aren't yet known.
> Is there some more recent guidance on this?  I read the wiki pages
> "Configuring distributed systems", "Partitioning proposal", and "HTTP
> Bulk Document API", but as far as I tell, node partitioning isn't
> implemented yet (right? things are moving really fast around here!).
> I have about 70G of gzipped files (about 3,000 files) that I need to
> unzip. convert to json, and store.  Unzipping explodes each file by
> about a factor of 7.  I expect that adding the JSON structure will
> increase the data size even more.  I read in an earlier posting that
> compacting the database will compress it back down significantly, but
> still, that's a big database file.
> I also have the option to break up the data into 9 logical chunks, but
> if I don't have to do it, I'd rather not.
> Anybody have any advice or experience with really big databases?
> Regards,
> James
> -- 
> James E. Marca
> Researcher
> Institute of Transportation Studies
> AIRB Suite 4000
> University of California
> Irvine, CA 92697-3600
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.

View raw message