couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Wolff <awo...@gmail.com>
Subject Re: Reduce Assumptions
Date Mon, 06 Apr 2009 14:46:37 GMT
Brian,
Thanks for this very helpful response. I'm still not quite clear on
the meaning of this:
"reduce functions should not grow its output larger than log(n) where n is
the number of input rows"

How is the size of the output measured? length of the JSON string? are
the input rows
the size of the value emitted by the map? Does this mean it's wrong to
have an entry in
your reduce output for every map key? I think I *have* been using
reduce incorrectly - I
should probably stick with map.

Thanks again,
A

On Mon, Apr 6, 2009 at 1:51 AM, Brian Candler <B.Candler@pobox.com> wrote:
> On Fri, Apr 03, 2009 at 07:55:45PM -0700, Adam Wolff wrote:
>> Thanks for this clear response. A related question: given a view like this:
>> map: function(doc){
>>     emit(doc.refId, doc.id);
>> },
>> reduce :  function(keys, values, rereduce){
>>     return values.join("");
>> }
>
> A reduce function of the form values.join("") is not good. At the bottom of
> http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
> you can see:
>
> "reduce functions should not grow its output larger than log(n) where n is
> the number of input rows"
>
> Furthermore:
>
> "the Reduce function has the requirement that not only must it be
> referentially transparent, but it must also be commutative and associative
> for the array value input"
>
> The way I read that: for any ordering of input keys, the output value of
> your reduce function must be the same.
>
> Your function doesn't have this property, so you cannot use it.
>
> Note that
>
>   reduce(M0, M1, M2, M3)
>
> and
>
>   rereduce(reduce(M0, M1), reduce(M2, M3))
>
> and
>
>   rereduce(reduce(M2, M3), reduce(M0, M1))
>
> must all evaluate to be the same. Sorting the values in your (re)reduce
> function might help except that you still wouldn't meet the log(n)
> criterion.
>
> Basically - this is not how reduce functions are intended to be used. They
> must somehow summarise the data, not aggregate it.
>
> What you need, I believe, is a simple map:
>
>  // MAP
>  function(doc) {
>    emit(doc.refId, null);
>  }
>
> Then the client can query giving a startkey and endkey, and get back a list
> of documents which contain that reference or range of references.
>
> You can add a reduce function like this:
>
>  // REDUCE
>  function(ks, vs, co) {
>    if (co) {
>      return sum(vs);
>    } else {
>      return vs.length;
>    }
>  }
>
> Then querying with group=true will give you the refIds mapped to a count of
> documents containing that refId. Querying with reduce=false will give you
> the same as the map function by itself.
>
> Regards,
>
> Brian.
>

Mime
View raw message