couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From J Chris Anderson <jch...@gmail.com>
Subject Re: performance issues
Date Mon, 05 Apr 2010 20:23:17 GMT

On Apr 5, 2010, at 12:19 PM, Julian Moritz wrote:

> Hi J Chris,
> 
> J Chris Anderson schrieb:
>> On Apr 5, 2010, at 11:52 AM, Julian Moritz wrote:
>> 
>>> Hi,
>>> 
>>> Julian Moritz schrieb:
>>>> Hi,
>>>> 
>>> I've just found this via google:
>>> 
>>>>> We don't parallelize view index creation yet, so this is not an
>>>>> additional problem for you. You can however build two views in
>>>>> parallel and make use of two cores that way.
>>> If this is (still) true, view index creation is the bottleneck of my
>>> application. Hence I'm just playing around and yet using 100% of my
>>> core, I cannot use CouchDB anymore.
>>> 
>> 
>> We rarely see view generation that is actually limited by view-function execution
speed. The majority of the time the actual bottleneck is disk IO. To parallelize view generation
the best option is to run a CouchDB-Lounge cluster.
>> 
> 
> Hm, at the moment I have access to two computers. This isn't what you
> mean with a "couchdb-lounge cluster", right?
> 
>> It looks like you might be better of removing your reduce function, which might also
speed things up.
>> 
> 
> But I need it for making my list unique. This is an important feature
> for my application.

This is probably explains the slowness. When you do a group=true query, CouchDB has to run
the reduce function once for each unique key (serializing all the rows in the key to the JS
process, and parsing the results.)

I haven't tested this, but you might get better response throughput by dropping the reduce
function and using a _list which only sends one row of output each time the key changes. This
will avoid some additional Erlang processing of the result.

Some documentation for _list is here:

http://books.couchdb.org/relax/design-documents/lists

> 
> Thanks, I'll think about how to set up a couchdb cluster and do more
> testing.
> 
> Regards
> Julian
> 
>> Chris
>> 
>> 
>>> Regards
>>> Julian
>>> 
>>>> I've developed a (in my eyes) simple view. I have a wordlist which does
>>>> not  contain unique words. I want to store it in a view, with every word
>>>> occurring once and ordered by random. Therefore I have a simple view
>>>> function:
>>>> 
>>>> function(doc){
>>>> emit([hash(doc.word), doc.word], null);
>>>> }
>>>> 
>>>> and a simple reduce:
>>>> 
>>>> function(key, values, rereduce){
>>>> return true;
>>>> }
>>>> 
>>>> calling that view with group=true it does what I want.
>>>> 
>>>> When storing plenty of words to the database, one of my two cpu cores is
>>>> used completely by couchjs.
>>>> 
>>>> Isn't the view built using two (or all) cpu cores? I thought (obviously
>>>> I'm wrong) that it would be calculated in parallel and using a
>>>> quadro-core (or more cores) would make storing faster.
>>>> 
>>>> Is there a solution for that? Should I use another query-server?
>>>> 
>>>> Regards
>>>> Julian
>>>> 
>> 
>> 


Mime
View raw message