couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicholas Retallack" <nickretall...@gmail.com>
Subject Re: Reduce is Really Slow!
Date Wed, 20 Aug 2008 20:32:33 GMT
Replacing 'return values' with 'return values.length' shows you're
right.  4 minutes for the first query, miliseconds afterward, as
opposed to forever.

I guess I was expecting reduce to do things it wasn't designed to do.
I notice ?group=true&group_level=1 is ignored unless a reduce function
of some sort exists though.  Is there any way to get this grouping
behavior without such extreme reductions in result size / performance?

The view I was using here (http://www.friendpaste.com/2AHz3ahr) was
designed to simply take each document with the same name and merge
them into one document, turning same-named fields into lists (here's a
more general version http://www.friendpaste.com/Ud6ELaXC).  This
reduces the document size, but only by whatever overhead the repeated
field names would add.  The fields I was reducing only contained
integers, so reduction did shrink documents by quite a bit.  It was
pretty handy, but the query took 25 seconds to return one result even
when called repeatedly.

Is there some technical reason for this limitation?

I had assumed reduce was just an ordinary post-processing step that I
could run once and have something akin to a brand new generated table
to query on, so I wrote my views to transform my data to fit the
various ways I wanted to view it.  It worked fine for small amounts of
data in little experiments, but as soon as I used it on my real
database, I hit this wall.

Are there plans to make reduce work for these more general
data-mangling tasks?  Or should I be approaching the problem a
different way?  Perhaps write my map calls differently so they produce
more rows for reduce to compact?  Or do something special if the third
parameter to reduce is true?

On Tue, Aug 19, 2008 at 5:41 PM, Damien Katz <damien@apache.org> wrote:
> You can return arrays and objects, whatever json allows. But if the object
> keeps getting bigger the more rows it reduces, then it simply won't work.
>
> The exception is that the size of the reduce value can be logarithmic with
> respect to the rows. The simplest example of logarithmic growth is the
> summing of a row value. With Erlangs bignums, the size on disk is
> Log2(Sum(Rows)), which is perfectly acceptable growth.
>
> -Damien
>
> On Aug 19, 2008, at 8:14 PM, Nicholas Retallack wrote:
>
>> Oh!  I didn't realize that was a rule.  I had used 'return values' in
>> attempt to run the simplest test possible on my data.  But hey, values is
>> an
>> array.  Does that mean you're not allowed to return objects like arrays
>> from
>> reduce at all?  Because I was kind of hoping I could.  I was able to do it
>> with smaller amounts of data, after all.  Perhaps this is due to re-reduce
>> kicking in?
>>
>> For the record, couchdb is still working on this query I started hours
>> ago,
>> and chewing up all my cpu.  I am going to have to kill it so I can get
>> some
>> work done.
>>
>> On Tue, Aug 19, 2008 at 4:21 PM, Damien Katz <damien@apache.org> wrote:
>>
>>> I think the problem with your reduce is that it looks like its not
>>> actually
>>> reducing to a single value, but instead using reduce for grouping data.
>>> That
>>> will cause severe performance problems.
>>>
>>> For reduce to work properly, you should end up with a fixed size data
>>> structure regardless of the number of values being reduced (not stricty
>>> true, but that's the general rule).
>>>
>>> -Damien
>>>
>>>
>>> On Aug 19, 2008, at 6:55 PM, Nicholas Retallack wrote:
>>>
>>> Okay, I got it built on gentoo instead, but I'm still having performance
>>>>
>>>> issues with reduce.
>>>>
>>>> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [async-threads:0]
>>>> couchdb - Apache CouchDB 0.8.1-incubating
>>>>
>>>> Here's a query I tried to do:
>>>>
>>>> I freshly imported about 191MB of data in 155399 documents.  29090 are
>>>> not
>>>> discarded by map.  Map produces one row with 5 fields for each of these
>>>> documents.  After grouping, each group should have four rows.  Reduce is
>>>> a
>>>> simple function(keys,values){return values}.
>>>>
>>>> Here's the query call:
>>>> time curl -X GET '
>>>>
>>>>
>>>> http://localhost:5984/clickfund/_view/offers/index?count=1&group=true&group_level=1
>>>> '
>>>>
>>>> This is running on a 512MB slicehost account.  http://www.slicehost.com/
>>>>
>>>> I'd love to give you this command's execution time, since I ran it last
>>>> night before I went to bed, but it must have taken over an hour because
>>>> my
>>>> laptop went to sleep and severed the connection.  Trying it again.
>>>>
>>>> Considering it's blazing fast without the reduce function, I can only
>>>> assume
>>>> what's taking all this time is overhead setting up and tearing down the
>>>> simple function(keys,values){return values}.
>>>>
>>>> I can give you guys the python source to set up this database so you can
>>>> try
>>>> it yourself if you like.
>>>>
>>>
>>>
>
>

Mime
View raw message