incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Wolff <awo...@gmail.com>
Subject Re: reduce/rereduce confusion
Date Wed, 21 Jan 2009 04:05:18 GMT
After looking at this more, let me restate. I would totally get all of this
if the signature of reduce was:reduce: function(key, values, rereduce)

What I don't get is: why does reduce get called with an arbitrarily long
list of keys? I thought reduce was precisely for reducing all of the mapped
inputs that are indexed under the *same* key. I think if I can get that, the
rest will come clear.

Thanks again,
A

On Tue, Jan 20, 2009 at 7:52 PM, Adam Wolff <awolff@gmail.com> wrote:

> Thanks for the reply!
> I'd seen all of this, though I re-read the wikipedia entry carefully.
> Damien's blog entries don't appear to match the APIs in the version I'm
> running, which is 0.8.1
> The wikipedia entry suggests that reduce is called only with values that
> match a single key. Using the log() function in CouchDB, I can see that's
> not the case for its reduce function -- it's called with multiple different
> keys, though it does appear that the input values are *ordered* by matching
> keys.
>
> Anyway, I totally get how re-reduce (or "combine") works in conventional
> map/reduce, but I'm hazy on the details w/r/t to CouchDB. I'm starting to
> understand the answer to #1, but I'm really unclear on #2 (how/why rereduce
> is run.)
>
> Thanks again,
> A
>
>
> On Tue, Jan 20, 2009 at 6:50 PM, Jeff Hinrichs - DM&T <dundeemt@gmail.com>wrote:
>
>> On Tue, Jan 20, 2009 at 7:47 PM, Adam Wolff <awolff@gmail.com> wrote:
>> > Hi everyone,I'm really excited about CouchDB and I've started playing
>> with
>> > it. I get all of it, except for reduce, and especially re-reduce.
>> >
>> > My first question is: how does CouchDB maintain all the separate output
>> for
>> > a given key from the map function? I mean: given a simple reduce that
>> just
>> > sums results, how does couch maintain separate results for each possible
>> > key/key range that can be given as input to that view?
>> >
>> > My second question: when and why does rereduce get called? Is this
>> simply to
>> > allow the server to chunk the processing, or is there semantic meaning
>> to
>> > it? I had assumed the former -- it's just a way of limiting the size of
>> the
>> > input to the reduce function -- but then this really confused me: if I
>> log
>> > each time my reduce function gets called, I see that the last time it's
>> > called, it's with rereduce=false. How is this possible? Don't all the
>> > results have to be funneled through rereduce to produce a single result
>> > value?
>> >
>> > Any help here would be much appreciated. If there's a resource on the
>> web I
>> > should look at, please send it my way. Thanks!
>> >
>> > A
>> Being that I just went through the learning process on reduce, I'll
>> point you here:
>> http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
>> "Reduce Functions"
>>
>> As a good place to start.
>> Also, the mailing list, is an excellent resource.
>>
>> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3c61B374C7-34D7-45C3-9F8B-F11EFD77303D@apache.org%3e
>>
>> along with:
>> http://en.wikipedia.org/wiki/MapReduce
>> http://labs.google.com/papers/mapreduce.html
>> and
>> http://damienkatz.net/2008/02/incremental_map.html
>>
>> Regards,
>>
>> Jeff
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message