Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Message-Id: <04BDA5F8-F209-4258-A5A1-52DFCF3784B0@mymedify.com>
From: Talib Sharif <tsharif@mymedify.com>
To: user@couchdb.apache.org
In-Reply-To: <673E2CFF-D6F7-4077-BB43-450B09EC089A@apache.org>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v936)
Subject: Re: Some stats about couch DB
Date: Fri, 23 Jul 2010 22:51:22 -0700
References: <E0E0E896-D0E5-46AD-9269-D84710AEDFBA@mymedify.com>
 <673E2CFF-D6F7-4077-BB43-450B09EC089A@apache.org>

Thanks Chris,

This is extremely helpful.

-Talib

On Jul 23, 2010, at 6:42 PM, J Chris Anderson wrote:

>
> On Jul 23, 2010, at 5:01 PM, Talib Sharif wrote:
>
>> Hi All,
>>
>> As I am playing more and more with couchdb (it is relaxing and  
>> fun), i just am trying to understand the limits and the  
>> expectations in large production system environment.
>>
>> Right now i have about 100K documents and i have about 10 different  
>> views, one of the view generates does about 100 emits per document.
>>
>> As i am building the view indexes, it is taking about 7-8 hours of  
>> time.
>>
>
> this is about right for 10 million rows. That works out to about 350  
> rows per second (maybe more depending on what your other view are  
> doing), which is a bit slower than I'm used to seeing, but it  
> depends on the size of your emitted keys and values. If you can  
> shrink the keys or the values you should see some speedup (marginal,  
> not an order of magnitude).
>
> because view generation is incremental, in production the 7-8 hours  
> isn't the big issue, it's whether view generation can keep up with  
> the insert rate. So if you are generating less than a few documents  
> per second (x 100 emitted rows) then you should be able to keep the  
> indexes current. If the indexes start to fall behind I'd suggest  
> either upgrading hardware or moving to a clustered solution like  
> CouchDB-Lounge.
>
> for purposes of prototyping you will probably be happier working on  
> a subset of the documents.
>
>
>> I would like to know is that how are other people using it?
>> Is 7-8 or even 24 hours of checkpointing view generation typical?
>> How many documents do people have??
>> How is other people's experience in genereting a view on lets say 1  
>> MIllion documents.
>>
>> I have switched to the native _sum function for reduce. What else  
>> is taking long? Is it the map function written in JavaScript? Is it  
>> the Index that's getting too big?
>>
>
>
> using an Erlang view function could potentially speed things up (but  
> my guess is you are more likely disk-io bound, not CPU bound, so  
> maybe it won't make much difference.)
>
>
>> Is the view generation linear or does it gets worse when you have  
>> more documents?
>>
>
>
> the btree should get slower at roughly O(log n) where n is the  
> number of rows. The base of the log is pretty big, too. Once you get  
> up to the billion-rows territory you'll probably want to look more  
> closely at CouchDB Lounge or the Cloudant clustering.
>
>> I would extremely appreciate help in answering or discussing these  
>> questions.
>>
>> Thanks in advance,
>> Talib
>>
>