couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: reduce/rereduce confusion
Date Wed, 21 Jan 2009 14:10:44 GMT

On 21 Jan 2009, at 05:05, Adam Wolff wrote:

> After looking at this more, let me restate. I would totally get all  
> of this
> if the signature of reduce was:reduce: function(key, values, rereduce)
>
> What I don't get is: why does reduce get called with an arbitrarily  
> long
> list of keys? I thought reduce was precisely for reducing all of the  
> mapped
> inputs that are indexed under the *same* key. I think if I can get  
> that, the
> rest will come clear.

See "Query processing" on http://horicky.blogspot.com/2008/10/couchdb-implementation.html

Cheers
Jan
--
>
>
> Thanks again,
> A
>
> On Tue, Jan 20, 2009 at 7:52 PM, Adam Wolff <awolff@gmail.com> wrote:
>
>> Thanks for the reply!
>> I'd seen all of this, though I re-read the wikipedia entry carefully.
>> Damien's blog entries don't appear to match the APIs in the version  
>> I'm
>> running, which is 0.8.1
>> The wikipedia entry suggests that reduce is called only with values  
>> that
>> match a single key. Using the log() function in CouchDB, I can see  
>> that's
>> not the case for its reduce function -- it's called with multiple  
>> different
>> keys, though it does appear that the input values are *ordered* by  
>> matching
>> keys.
>>
>> Anyway, I totally get how re-reduce (or "combine") works in  
>> conventional
>> map/reduce, but I'm hazy on the details w/r/t to CouchDB. I'm  
>> starting to
>> understand the answer to #1, but I'm really unclear on #2 (how/why  
>> rereduce
>> is run.)
>>
>> Thanks again,
>> A
>>
>>
>> On Tue, Jan 20, 2009 at 6:50 PM, Jeff Hinrichs - DM&T <dundeemt@gmail.com

>> >wrote:
>>
>>> On Tue, Jan 20, 2009 at 7:47 PM, Adam Wolff <awolff@gmail.com>  
>>> wrote:
>>>> Hi everyone,I'm really excited about CouchDB and I've started  
>>>> playing
>>> with
>>>> it. I get all of it, except for reduce, and especially re-reduce.
>>>>
>>>> My first question is: how does CouchDB maintain all the separate  
>>>> output
>>> for
>>>> a given key from the map function? I mean: given a simple reduce  
>>>> that
>>> just
>>>> sums results, how does couch maintain separate results for each  
>>>> possible
>>>> key/key range that can be given as input to that view?
>>>>
>>>> My second question: when and why does rereduce get called? Is this
>>> simply to
>>>> allow the server to chunk the processing, or is there semantic  
>>>> meaning
>>> to
>>>> it? I had assumed the former -- it's just a way of limiting the  
>>>> size of
>>> the
>>>> input to the reduce function -- but then this really confused me:  
>>>> if I
>>> log
>>>> each time my reduce function gets called, I see that the last  
>>>> time it's
>>>> called, it's with rereduce=false. How is this possible? Don't all  
>>>> the
>>>> results have to be funneled through rereduce to produce a single  
>>>> result
>>>> value?
>>>>
>>>> Any help here would be much appreciated. If there's a resource on  
>>>> the
>>> web I
>>>> should look at, please send it my way. Thanks!
>>>>
>>>> A
>>> Being that I just went through the learning process on reduce, I'll
>>> point you here:
>>> http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
>>> "Reduce Functions"
>>>
>>> As a good place to start.
>>> Also, the mailing list, is an excellent resource.
>>>
>>> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%3c61B374C7-34D7-45C3-9F8B-F11EFD77303D@apache.org%3e
>>>
>>> along with:
>>> http://en.wikipedia.org/wiki/MapReduce
>>> http://labs.google.com/papers/mapreduce.html
>>> and
>>> http://damienkatz.net/2008/02/incremental_map.html
>>>
>>> Regards,
>>>
>>> Jeff
>>>
>>
>>


Mime
View raw message