couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: Reduce is Really Slow!
Date Wed, 20 Aug 2008 00:41:51 GMT
You can return arrays and objects, whatever json allows. But if the  
object keeps getting bigger the more rows it reduces, then it simply  
won't work.

The exception is that the size of the reduce value can be logarithmic  
with respect to the rows. The simplest example of logarithmic growth  
is the summing of a row value. With Erlangs bignums, the size on disk  
is Log2(Sum(Rows)), which is perfectly acceptable growth.

-Damien

On Aug 19, 2008, at 8:14 PM, Nicholas Retallack wrote:

> Oh!  I didn't realize that was a rule.  I had used 'return values' in
> attempt to run the simplest test possible on my data.  But hey,  
> values is an
> array.  Does that mean you're not allowed to return objects like  
> arrays from
> reduce at all?  Because I was kind of hoping I could.  I was able to  
> do it
> with smaller amounts of data, after all.  Perhaps this is due to re- 
> reduce
> kicking in?
>
> For the record, couchdb is still working on this query I started  
> hours ago,
> and chewing up all my cpu.  I am going to have to kill it so I can  
> get some
> work done.
>
> On Tue, Aug 19, 2008 at 4:21 PM, Damien Katz <damien@apache.org>  
> wrote:
>
>> I think the problem with your reduce is that it looks like its not  
>> actually
>> reducing to a single value, but instead using reduce for grouping  
>> data. That
>> will cause severe performance problems.
>>
>> For reduce to work properly, you should end up with a fixed size data
>> structure regardless of the number of values being reduced (not  
>> stricty
>> true, but that's the general rule).
>>
>> -Damien
>>
>>
>> On Aug 19, 2008, at 6:55 PM, Nicholas Retallack wrote:
>>
>> Okay, I got it built on gentoo instead, but I'm still having  
>> performance
>>> issues with reduce.
>>>
>>> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [async- 
>>> threads:0]
>>> couchdb - Apache CouchDB 0.8.1-incubating
>>>
>>> Here's a query I tried to do:
>>>
>>> I freshly imported about 191MB of data in 155399 documents.  29090  
>>> are not
>>> discarded by map.  Map produces one row with 5 fields for each of  
>>> these
>>> documents.  After grouping, each group should have four rows.   
>>> Reduce is a
>>> simple function(keys,values){return values}.
>>>
>>> Here's the query call:
>>> time curl -X GET '
>>>
>>> http://localhost:5984/clickfund/_view/offers/index?count=1&group=true&group_level=1
>>> '
>>>
>>> This is running on a 512MB slicehost account.  http://www.slicehost.com/
>>>
>>> I'd love to give you this command's execution time, since I ran it  
>>> last
>>> night before I went to bed, but it must have taken over an hour  
>>> because my
>>> laptop went to sleep and severed the connection.  Trying it again.
>>>
>>> Considering it's blazing fast without the reduce function, I can  
>>> only
>>> assume
>>> what's taking all this time is overhead setting up and tearing  
>>> down the
>>> simple function(keys,values){return values}.
>>>
>>> I can give you guys the python source to set up this database so  
>>> you can
>>> try
>>> it yourself if you like.
>>>
>>
>>


Mime
View raw message