incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Incremental map/reduce
Date Fri, 30 Jan 2009 10:34:23 GMT
Hi Brian,

http://damienkatz.net/2008/02/incremental_map.html
http://damienkatz.net/2008/02/incremental_map_1.html
http://horicky.blogspot.com/2008/10/couchdb-implementation.html

Cheers
Jan
--
On 30 Jan 2009, at 11:18, Brian Candler wrote:

> I'm trying to understand how couchdb does incremental updates to its
> map/reduce views. By this, I understand that it only has to read those
> documents which have changed since the view was last updated.
>
> I have an idea how it might work, so let me pose it as an example.  
> Here's a
> database with 6 documents and a very simple summing map/reduce  
> function.
>
> -------------------------------------------------------------
> #!/bin/sh
> set -xe
>
> HOST1=http://localhost:5984
> LOCAL1=maptest
> DB1="$HOST1/$LOCAL1"
>
> curl -X DELETE "$DB1"; echo
> curl -X PUT "$DB1"; echo
> curl -sX PUT -d '{"values":[1,2]}' "$DB1/doc1"
> curl -sX PUT -d '{"values":[]}' "$DB1/doc2"
> curl -sX PUT -d '{"values":[3,4,5]}' "$DB1/doc3"
> curl -sX PUT -d '{"values":[6,7]}' "$DB1/doc4"
> curl -sX PUT -d '{"values":[8]}' "$DB1/doc5"
> curl -sX PUT -d '{"values":[9]}' "$DB1/doc6"
> curl -sX PUT -T - "$DB1/_design/test" <<EOS
> {
>  "views": {
>    "sum": {
>      "map": "function(doc) { if (doc.values) { for (var i in  
> doc.values) { emit(null,doc.values[i]) } } }",
>      "reduce": "function(key, values) { return sum(values) }"
>    }
>  }
> }
> EOS
>
> echo -e "\n\nReading view..."
> curl "$DB1/_view/test/sum"
> -------------------------------------------------------------
>
> I am guessing there must be an upper bound to the number of keys and  
> values
> passed at once to the reduce function, so let me assume for now that  
> limit
> is 3. Then the process is:
>
>           doc1       doc3     doc4  doc5 doc6
> MAP         / \     /  |  \     / \    |   |
>           1   2   3   4   5   6   7   8   9
> REDUCE      \  |  /     \  |  /     \  |  /
>               6          15          24
> REDUCE          `--------. | ,--------'
>                          45
>
> Now, suppose I delete doc5. In order to update this sum  
> incrementally, I
> think I would have to:
> (1) Regenerate the map block(s) which included doc5
>    - so this would now be [7,9] instead of [7,8,9]
> (2) Reduce those block(s) - giving 16 instead of 24
> (3) Down the reduction tree, re-reduce those blocks which depend on  
> that
>    input
>
> Is that an accurate description of the process?
>
> In order to do this, couchdb would have to remember all the  
> intermediate
> reduce values, and also know that the intermediate value 24 was  
> derived from
> doc4, doc5 and doc6. Furthermore, it would have to re-map doc4 and  
> doc6 to
> generate the new value, even though they had not changed.
>
> Alternatively, it could keep all the map results materialised, each  
> tagged
> with the the docid where it came from, so it could simply remove the  
> 8 from
> the map when doc5 is deleted. Is that what it does? (If so, I think  
> it could
> be useful to expose that in the view, so that a single view could be  
> used
> as both map and map+reduce)
>
> Now, the other thing which I don't understand is group_level.
>
> I can understand that emit gives (key,value) pairs, so group=true  
> gives me
> the intermediate reduction values for rows with identical keys. For  
> example,
> if I change the above map function to
>
>      "map": "function(doc) { if (doc.values) { for (var i in  
> doc.values) { emit(doc._id,doc.values[i]) } } }",
>
> this gives the result I expect, and I imagine that the reduction  
> process
> is working something like this:
>
>           doc1       doc3     doc4  doc5 doc6
> MAP         / \     /  |  \     / \    |   |
>           1   2   3   4   5   6   7   8   9
> REDUCE      \ /     \  |  /     \ /    |   |
>             3        12        13     8   9   ==> group=true
> REDUCE        `------. | ,-----'        \ /
>                      28                17
> REDUCE                  `------. ,-----'
>                               45              ==> group=false
>
> Is that right? But in that case, what does group_level=2 do? Is this
> something which comes into play if emitted keys are structured as  
> arrays?
>
> Anyway, sorry for the long E-mail. I'm just trying to get all this  
> clear in
> my head :-)
>
> Many thanks,
>
> Brian.
>


Mime
View raw message