incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: Reduce Assumptions
Date Mon, 30 Mar 2009 17:34:13 GMT
Wow, very nice exposition.  Cheers,

Adam

On Mar 30, 2009, at 4:47 AM, Brian Candler wrote:

> On Sat, Mar 28, 2009 at 07:38:24PM -0600, Tom McNulty wrote:
>> my map function produces output like:
>>
>> [X, Y, 0]  -> Object_A
>> [X, Y, 1]  -> Object_B1
>> [X, Y, 1]  -> Object_B1
>> [X, Y, 1]  -> Object_B1
>> [Z, Q, 0] ....
>>
>> Here I apply group_level=2, and use a ranged query ( [X, 0] to [X,  
>> [] ] )
>> since Y >= 0
>
> Aside: you can use [X,null] to [X,{}] and then it doesn't matter  
> about the
> value of Y
>
>> Now during the reduce phase, I combine together Object_A's and
>> associated Object_B's. Can I assume that the first of the values  
>> sent to
>> 'reduce' is Object_A?
>
> I think not, because on a large database objects to be reduced will  
> be sent
> to your reduce function in batches, and these batches will be broken  
> up on
> B-tree boundaries, which may occur in arbitrary places. e.g. your  
> reduce
> function may receive
>
>   [Object_A, Object_B1]
>
> and then in a separate invocation
>
>   [Object_B1, Object_B1]
>
> Furthermore: due to reduce optimisations, you may only receive some  
> of the
> blocks to be reduced. Example: take these three Btree nodes:
>
>     [a b c d e f g] [h i j k l m n] [o p q r s t u]
>            R1              R2              R3
>
> The reduce value of all the items in each Btree node is stored  
> within each
> node, e.g. [a b c d e f g] reduces to R1. Now suppose someone asks  
> for a
> reduce value across a key range:
>
>                      key range
>              <----------------------------->
>     [a b c d e f g] [h i j k l m n] [o p q r s t u]
>
> As I understand it, CouchDB will call your reduce function to  
> calculate a
> value for [e f g] and for [o p q r], but will use the existing
> stored/calculated value of R2 across the middle block.
>
> Therefore, it is wrong to attempt to maintain any sort of state in  
> your
> reduce function between invocations. And because the Btree node  
> boundaries
> can appear in any place, it is wrong to attempt to cross-reference  
> adjacent
> documents too.
>
> So I believe this sort of processing needs to take place in the  
> client, not
> in a reduce function.
>
> Regards,
>
> Brian.


Mime
View raw message