couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: reduce/rereduce confusion
Date Wed, 21 Jan 2009 05:13:29 GMT
On Tue, Jan 20, 2009 at 8:05 PM, Adam Wolff <awolff@gmail.com> wrote:
> After looking at this more, let me restate. I would totally get all of this
> if the signature of reduce was:reduce: function(key, values, rereduce)
>
> What I don't get is: why does reduce get called with an arbitrarily long
> list of keys? I thought reduce was precisely for reducing all of the mapped
> inputs that are indexed under the *same* key. I think if I can get that, the
> rest will come clear.

The thing that makes CouchDB's reduce different from, say, the Hadoop
implementation, is that it does not group by key at computation time.

Instead, a reduce function should aim to return a single value for the
entire view. Eg, 15,346, which could be the total number of posts in
your view

CouchDB allows you to query for reduction values for any arbitrary key
range very efficiently. So depending on your key structure, if you
want the total number of posts by jchris in January, you could ask for
reduce for all keys between

["jchris",[2009,0]] and ["jchris",[2009,0,{}]]

and get a result of, say, 14.

For details about specifying start and end keys see
http://wiki.apache.org/couchdb/View_collation

The group=true and group_level parameters may seem confusing at first,
but once you understand that they are just macros for running a series
of reduce queries (where CouchDB will pick key ranges for you), they
aren't so mysterious.

>
> Thanks again,
> A
>
> On Tue, Jan 20, 2009 at 7:52 PM, Adam Wolff <awolff@gmail.com> wrote:
>
>> Thanks for the reply!
>> I'd seen all of this, though I re-read the wikipedia entry carefully.
>> Damien's blog entries don't appear to match the APIs in the version I'm
>> running, which is 0.8.1
>> The wikipedia entry suggests that reduce is called only with values that
>> match a single key. Using the log() function in CouchDB, I can see that's
>> not the case for its reduce function -- it's called with multiple different
>> keys, though it does appear that the input values are *ordered* by matching
>> keys.
>>
>> Anyway, I totally get how re-reduce (or "combine") works in conventional
>> map/reduce, but I'm hazy on the details w/r/t to CouchDB. I'm starting to
>> understand the answer to #1, but I'm really unclear on #2 (how/why rereduce
>> is run.)
>>
>> Thanks again,
>> A
>>
>>
>> On Tue, Jan 20, 2009 at 6:50 PM, Jeff Hinrichs - DM&T <dundeemt@gmail.com>wrote:
>>
>>> On Tue, Jan 20, 2009 at 7:47 PM, Adam Wolff <awolff@gmail.com> wrote:
>>> > Hi everyone,I'm really excited about CouchDB and I've started playing
>>> with
>>> > it. I get all of it, except for reduce, and especially re-reduce.
>>> >
>>> > My first question is: how does CouchDB maintain all the separate output
>>> for
>>> > a given key from the map function? I mean: given a simple reduce that
>>> just
>>> > sums results, how does couch maintain separate results for each possible
>>> > key/key range that can be given as input to that view?
>>> >
>>> > My second question: when and why does rereduce get called? Is this
>>> simply to
>>> > allow the server to chunk the processing, or is there semantic meaning
>>> to
>>> > it? I had assumed the former -- it's just a way of limiting the size of
>>> the
>>> > input to the reduce function -- but then this really confused me: if I
>>> log
>>> > each time my reduce function gets called, I see that the last time it's
>>> > called, it's with rereduce=false. How is this possible? Don't all the
>>> > results have to be funneled through rereduce to produce a single result
>>> > value?
>>> >
>>> > Any help here would be much appreciated. If there's a resource on the
>>> web I
>>> > should look at, please send it my way. Thanks!
>>> >
>>> > A
>>> Being that I just went through the learning process on reduce, I'll
>>> point you here:
>>> http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
>>> "Reduce Functions"
>>>
>>> As a good place to start.
>>> Also, the mailing list, is an excellent resource.
>>>
>>> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3c61B374C7-34D7-45C3-9F8B-F11EFD77303D@apache.org%3e
>>>
>>> along with:
>>> http://en.wikipedia.org/wiki/MapReduce
>>> http://labs.google.com/papers/mapreduce.html
>>> and
>>> http://damienkatz.net/2008/02/incremental_map.html
>>>
>>> Regards,
>>>
>>> Jeff
>>>
>>
>>
>



-- 
Chris Anderson
http://jchris.mfdz.com

Mime
View raw message