couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zachary Zolton <zachary.zol...@gmail.com>
Subject Re: Reduce limitations
Date Tue, 02 Jun 2009 19:23:34 GMT
I think it's worth putting in writing (perhaps on the wiki?) that
reduces functions are useful for *computations* over your data—not
trying to emulate SQL-style joins!

This was a hard lesson for me. I found I had to change my code
considerably to do more of that work on the client side, due to the
performance implications I found in a production environment.

Doing group=true queries of reduce views causes considerable work, as
all the work in the reduce steps must be repeated each time. Keep your
view code simple, especially if you're on EC2 small instances!

Cheers,
Zach

On Tue, Jun 2, 2009 at 4:04 AM, Brian Candler <B.Candler@pobox.com> wrote:
> On Thu, May 28, 2009 at 12:54:16PM -0700, Chris Anderson wrote:
>> The deal is that if your reduce function's output is the same size as
>> its input, the final reduce value will end up being as large as all
>> the map rows put together.
>>
>> If your reduce function's output is 1/2 the size of it's input, you'll
>> also end up with quite a large amount of data in the final reduce. In
>> these cases each reduction stage actually accumulates more data, as it
>> is based on ever increasing numbers of map rows.
>>
>> If the function reduces data fast enough, the intermediate reduction
>> values will stay relatively constant, even as each reduce stage
>> reflects logarithmically more map rows. This is the kind of reduce
>> function you want.
>
> So actually, the requirement is that the final (root) result of the reduce
> process should be of a moderate size, and so should all the intermediate
> reduce values which comprise it. That makes sense.
>
> Depending on the type of reduce function you use, the "growth" may or may
> not be related to the number of documents which have been reduced together
> to form the reduce value.
>
> For example: an object which returns a map of {tag: count}, where the number
> of unique tags is bounded, may return a fairly large object when reduced
> across a small number of docs, but the final root reduce is no larger. So
> all you need to do is keep the number of tags into a 'reasonable' range
> (e.g. tens rather than thousands)
>
> Regards,
>
> Brian.
>

Mime
View raw message