couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Van Pelt" <vanp...@gmail.com>
Subject Re: Slooooow views
Date Thu, 08 Jan 2009 00:31:30 GMT
I chose couch because I needed a way to take arbitrary hashes and
combine them, performing various operations on dynamic key/value
pairs.  Seeing that couch would eventually be able to do this in a
distributed manor seemed like a great fit.

My impression was that the reduce step was incremental once the
functions were defined...  Given the referential transparency of my
reduce function, I don't understand the performance impact incurred by
the large dynamic hash output from my reduce function.  Can you think
of a better fit for my needs in another solution?

Chris

On Wed, Jan 7, 2009 at 4:00 PM, Damien Katz <damien@apache.org> wrote:
> In Couchdb, your reductions must compute to smallish, fixed sized data. The
> problem is your reduce function, it's builds up and returns a map of values,
> and as it computes the index, it will actually compute the reduction of
> every value in the view. Every time the index is updated, it does this.
>
> -Damien
>
>
> On Jan 7, 2009, at 6:38 PM, Chris Van Pelt wrote:
>
>> Ok, so I created a gist with the map, reduce, and a document:
>> http://gist.github.com/44497
>>
>> The purpose of this view is to combine multiple judgments (the data
>> attribute of the doc) for a single unit_id.  The "fields
>> attribute tells couch how to aggregate the data (averaging numbers,
>> choosing the most common item, etc.).
>>
>> I do use group=true, along with skip and count when querying this
>> view.  I understand that skip can slow things down, but the request is
>> still slow when skip is 0.
>>
>> Another strange thing is that even when I query one of my "count"
>> views (a simple sum() reduce step) I experience the same lag.  Could
>> this be because my count views are a part of the same design document?
>>
>> Also are there better ways to debug this?  I've set my log level to
>> debug, but it doesn't give me details about where the time spent
>> processing is going, and I can only gauge response times to the
>> second...
>>
>> Chris
>>
>> On Wed, Jan 7, 2009 at 3:12 PM, Chris Anderson <jchris@gmail.com> wrote:
>>>
>>> On Wed, Jan 7, 2009 at 3:07 PM, Jeremy Wall <jwall@google.com> wrote:
>>>>
>>>> Maybe someone else could chime in on when you get the hit for reduction?
>>>>
>>>
>>> Based on my use of log() in the reduce function, it looks like for
>>> each reduce query, the reduce function is run once, to obtain the
>>> final reduce value.
>>>
>>> When you run a group=true, or group_level reduce query, which returns
>>> values for many keys, you'll end up running the final reduction once
>>> per returned value. I think this could be optimized to avoid running
>>> final reduces if they've already been run for those key-ranges. I'm
>>> not sure how much work that would be.
>>>
>>> --
>>> Chris Anderson
>>> http://jchris.mfdz.com
>>>
>
>

Mime
View raw message