couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject Re: Reduce Assumptions
Date Mon, 30 Mar 2009 08:47:28 GMT
On Sat, Mar 28, 2009 at 07:38:24PM -0600, Tom McNulty wrote:
> my map function produces output like:
>
> [X, Y, 0]  -> Object_A
> [X, Y, 1]  -> Object_B1
> [X, Y, 1]  -> Object_B1
> [X, Y, 1]  -> Object_B1
> [Z, Q, 0] ....
>
> Here I apply group_level=2, and use a ranged query ( [X, 0] to [X, [] ] ) 
>  since Y >= 0

Aside: you can use [X,null] to [X,{}] and then it doesn't matter about the
value of Y

> Now during the reduce phase, I combine together Object_A's and  
> associated Object_B's. Can I assume that the first of the values sent to 
> 'reduce' is Object_A?

I think not, because on a large database objects to be reduced will be sent
to your reduce function in batches, and these batches will be broken up on
B-tree boundaries, which may occur in arbitrary places. e.g. your reduce
function may receive

   [Object_A, Object_B1]

and then in a separate invocation

   [Object_B1, Object_B1]

Furthermore: due to reduce optimisations, you may only receive some of the
blocks to be reduced. Example: take these three Btree nodes:

     [a b c d e f g] [h i j k l m n] [o p q r s t u]
            R1              R2              R3

The reduce value of all the items in each Btree node is stored within each
node, e.g. [a b c d e f g] reduces to R1. Now suppose someone asks for a
reduce value across a key range:

                      key range
              <----------------------------->
     [a b c d e f g] [h i j k l m n] [o p q r s t u]

As I understand it, CouchDB will call your reduce function to calculate a
value for [e f g] and for [o p q r], but will use the existing
stored/calculated value of R2 across the middle block.

Therefore, it is wrong to attempt to maintain any sort of state in your
reduce function between invocations. And because the Btree node boundaries
can appear in any place, it is wrong to attempt to cross-reference adjacent
documents too.

So I believe this sort of processing needs to take place in the client, not
in a reduce function.

Regards,

Brian.

Mime
View raw message