incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Wolff <awo...@gmail.com>
Subject Re: Reduce Assumptions
Date Sat, 04 Apr 2009 02:55:45 GMT
Thanks for this clear response. A related question: given a view like this:
map: function(doc){
    emit(doc.refId, doc.id);
},
reduce :  function(keys, values, rereduce){
    return values.join("");
}

Can I make any assumptions about the order of the values passed the
reduce function? Is that order stable for a given set of keys? That
is, would this reduce function always return this value for the same
map output, or is it liable to vary?

Thanks again,
Adam

On Mon, Mar 30, 2009 at 1:47 AM, Brian Candler <B.Candler@pobox.com> wrote:
> On Sat, Mar 28, 2009 at 07:38:24PM -0600, Tom McNulty wrote:
>> my map function produces output like:
>>
>> [X, Y, 0]  -> Object_A
>> [X, Y, 1]  -> Object_B1
>> [X, Y, 1]  -> Object_B1
>> [X, Y, 1]  -> Object_B1
>> [Z, Q, 0] ....
>>
>> Here I apply group_level=2, and use a ranged query ( [X, 0] to [X, [] ] )
>>  since Y >= 0
>
> Aside: you can use [X,null] to [X,{}] and then it doesn't matter about the
> value of Y
>
>> Now during the reduce phase, I combine together Object_A's and
>> associated Object_B's. Can I assume that the first of the values sent to
>> 'reduce' is Object_A?
>
> I think not, because on a large database objects to be reduced will be sent
> to your reduce function in batches, and these batches will be broken up on
> B-tree boundaries, which may occur in arbitrary places. e.g. your reduce
> function may receive
>
>   [Object_A, Object_B1]
>
> and then in a separate invocation
>
>   [Object_B1, Object_B1]
>
> Furthermore: due to reduce optimisations, you may only receive some of the
> blocks to be reduced. Example: take these three Btree nodes:
>
>     [a b c d e f g] [h i j k l m n] [o p q r s t u]
>            R1              R2              R3
>
> The reduce value of all the items in each Btree node is stored within each
> node, e.g. [a b c d e f g] reduces to R1. Now suppose someone asks for a
> reduce value across a key range:
>
>                      key range
>              <----------------------------->
>     [a b c d e f g] [h i j k l m n] [o p q r s t u]
>
> As I understand it, CouchDB will call your reduce function to calculate a
> value for [e f g] and for [o p q r], but will use the existing
> stored/calculated value of R2 across the middle block.
>
> Therefore, it is wrong to attempt to maintain any sort of state in your
> reduce function between invocations. And because the Btree node boundaries
> can appear in any place, it is wrong to attempt to cross-reference adjacent
> documents too.
>
> So I believe this sort of processing needs to take place in the client, not
> in a reduce function.
>
> Regards,
>
> Brian.
>

Mime
View raw message