couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From J Chris Anderson <jch...@apache.org>
Subject Re: Some stats about couch DB
Date Sat, 24 Jul 2010 01:42:05 GMT

On Jul 23, 2010, at 5:01 PM, Talib Sharif wrote:

> Hi All,
> 
> As I am playing more and more with couchdb (it is relaxing and fun), i just am trying
to understand the limits and the expectations in large production system environment.
> 
> Right now i have about 100K documents and i have about 10 different views, one of the
view generates does about 100 emits per document.
> 
> As i am building the view indexes, it is taking about 7-8 hours of time.
> 

this is about right for 10 million rows. That works out to about 350 rows per second (maybe
more depending on what your other view are doing), which is a bit slower than I'm used to
seeing, but it depends on the size of your emitted keys and values. If you can shrink the
keys or the values you should see some speedup (marginal, not an order of magnitude).

because view generation is incremental, in production the 7-8 hours isn't the big issue, it's
whether view generation can keep up with the insert rate. So if you are generating less than
a few documents per second (x 100 emitted rows) then you should be able to keep the indexes
current. If the indexes start to fall behind I'd suggest either upgrading hardware or moving
to a clustered solution like CouchDB-Lounge.

for purposes of prototyping you will probably be happier working on a subset of the documents.


> I would like to know is that how are other people using it?
> Is 7-8 or even 24 hours of checkpointing view generation typical?
> How many documents do people have??
> How is other people's experience in genereting a view on lets say 1 MIllion documents.
> 
> I have switched to the native _sum function for reduce. What else is taking long? Is
it the map function written in JavaScript? Is it the Index that's getting too big?
> 


using an Erlang view function could potentially speed things up (but my guess is you are more
likely disk-io bound, not CPU bound, so maybe it won't make much difference.)


> Is the view generation linear or does it gets worse when you have more documents?
> 


the btree should get slower at roughly O(log n) where n is the number of rows. The base of
the log is pretty big, too. Once you get up to the billion-rows territory you'll probably
want to look more closely at CouchDB Lounge or the Cloudant clustering.

> I would extremely appreciate help in answering or discussing these questions.
> 
> Thanks in advance,
> Talib
> 


Mime
View raw message