incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Reduce Assumptions
Date Mon, 30 Mar 2009 17:39:22 GMT

On 30 Mar 2009, at 19:34, Adam Kocoloski wrote:

> Wow, very nice exposition.  Cheers,


Yeah good job Brian, this is almost worth to put into the wiki (well,
not even almost...)!

Cheers
Jan
--

>
> Adam
>
> On Mar 30, 2009, at 4:47 AM, Brian Candler wrote:
>
>> On Sat, Mar 28, 2009 at 07:38:24PM -0600, Tom McNulty wrote:
>>> my map function produces output like:
>>>
>>> [X, Y, 0]  -> Object_A
>>> [X, Y, 1]  -> Object_B1
>>> [X, Y, 1]  -> Object_B1
>>> [X, Y, 1]  -> Object_B1
>>> [Z, Q, 0] ....
>>>
>>> Here I apply group_level=2, and use a ranged query ( [X, 0] to [X,  
>>> [] ] )
>>> since Y >= 0
>>
>> Aside: you can use [X,null] to [X,{}] and then it doesn't matter  
>> about the
>> value of Y
>>
>>> Now during the reduce phase, I combine together Object_A's and
>>> associated Object_B's. Can I assume that the first of the values  
>>> sent to
>>> 'reduce' is Object_A?
>>
>> I think not, because on a large database objects to be reduced will  
>> be sent
>> to your reduce function in batches, and these batches will be  
>> broken up on
>> B-tree boundaries, which may occur in arbitrary places. e.g. your  
>> reduce
>> function may receive
>>
>>  [Object_A, Object_B1]
>>
>> and then in a separate invocation
>>
>>  [Object_B1, Object_B1]
>>
>> Furthermore: due to reduce optimisations, you may only receive some  
>> of the
>> blocks to be reduced. Example: take these three Btree nodes:
>>
>>    [a b c d e f g] [h i j k l m n] [o p q r s t u]
>>           R1              R2              R3
>>
>> The reduce value of all the items in each Btree node is stored  
>> within each
>> node, e.g. [a b c d e f g] reduces to R1. Now suppose someone asks  
>> for a
>> reduce value across a key range:
>>
>>                     key range
>>             <----------------------------->
>>    [a b c d e f g] [h i j k l m n] [o p q r s t u]
>>
>> As I understand it, CouchDB will call your reduce function to  
>> calculate a
>> value for [e f g] and for [o p q r], but will use the existing
>> stored/calculated value of R2 across the middle block.
>>
>> Therefore, it is wrong to attempt to maintain any sort of state in  
>> your
>> reduce function between invocations. And because the Btree node  
>> boundaries
>> can appear in any place, it is wrong to attempt to cross-reference  
>> adjacent
>> documents too.
>>
>> So I believe this sort of processing needs to take place in the  
>> client, not
>> in a reduce function.
>>
>> Regards,
>>
>> Brian.
>
>


Mime
View raw message