Return-Path: Delivered-To: apmail-couchdb-commits-archive@www.apache.org Received: (qmail 61006 invoked from network); 20 Jan 2010 16:46:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Jan 2010 16:46:56 -0000 Received: (qmail 13919 invoked by uid 500); 20 Jan 2010 16:46:56 -0000 Delivered-To: apmail-couchdb-commits-archive@couchdb.apache.org Received: (qmail 13851 invoked by uid 500); 20 Jan 2010 16:46:55 -0000 Mailing-List: contact commits-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list commits@couchdb.apache.org Received: (qmail 13842 invoked by uid 500); 20 Jan 2010 16:46:55 -0000 Delivered-To: apmail-incubator-couchdb-commits@incubator.apache.org Received: (qmail 13839 invoked by uid 99); 20 Jan 2010 16:46:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2010 16:46:55 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2010 16:46:46 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id BD99A16E0A; Wed, 20 Jan 2010 16:46:25 +0000 (GMT) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Wed, 20 Jan 2010 16:46:25 -0000 Message-ID: <20100120164625.6029.57510@eos.apache.org> Subject: =?utf-8?q?Introduction=5Fto=5FCouchDB=5Fviews_reverted_to_revision_27_on_?= =?utf-8?q?Couchdb_Wiki?= Dear wiki user, You have subscribed to a wiki page "Couchdb Wiki" for change notification. The page Introduction_to_CouchDB_views has been reverted to revision 27 by = BrianChamberlain. The comment on this change is: The GUI editor (or my stupidity, possibly bo= th) made all the text from my edit on down BOLD. Doh!. http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views?action=3Ddiff&= rev1=3D29&rev2=3D30 -------------------------------------------------- A simple introduction to CouchDB views. = =3D=3D Concept =3D=3D + = Views are the primary tool used for querying and reporting on CouchDB doc= uments. There are two different kinds of views: permanent and temporary vie= ws. = '''Permanent views''' are stored inside special documents called design d= ocuments, and can be accessed via an HTTP ''GET'' request to the URI ''/{db= name}/{docid}/{viewname}'', where ''{docid}'' has the prefix ''_design/'' s= o that CouchDB recognizes the document as a design document, and ''{viewnam= e}'' has the prefix ''_view/'' so that CouchDB recognizes it as a view. @@ -15, +16 @@ = Note that by default views are not created and updated when a document is= saved, but rather, when it is accessed. As a result, the first access migh= t take some time depending on the size of your data while CouchDB creates t= he view. If preferable the views can also be updated when a document is sav= ed using an external script that calls the views when updates have been mad= e. An example can be found here: RegeneratingViewsOnUpdate = - Note that all views in a single design document get updated when one of t= he views in that design document gets queried. + Note that all views in a single design document get updated when one of t= he views in that design document gets queried. = = Note on !JavaScript API change: Prior to Tue, 20 May 2008 (Subversion rev= ision r658405) the function to emit a row to the map index, was named "map"= . It has now been changed to "emit". = + = =3D=3D Basics =3D=3D + = =3D=3D=3D Map Functions =3D=3D=3D + = Here is the simplest example of a map function: = {{{ @@ -28, +32 @@ emit(null, doc); } }}} + = This function defines a table that contains all the documents in a CouchD= B database, with no particular key. = A view function should accept a single argument: the document object. To = produce results, it should call the implicitly available ''emit(key, value)= '' function. For every invocation of that function, a result row is added t= o the view (if neither the ''key'' nor the ''value'' are undefined). As doc= uments are added, edited and deleted, the rows in the computed table are up= dated automatically. @@ -41, +46 @@ } } }}} + = For each document in the database that has a Type field with the value ''= customer'', a row is created in the view. The ''value'' column of the view = contains the ''!LastName'', ''!FirstName'', and ''Address'' fields for each= document. The key for all the documents is null in this case. = To be able to filter or sort the view by some document property, you woul= d use that property for the key. For example, the following view would allo= w you to lookup customer documents by the ''!LastName'' or ''!FirstName'' f= ields: @@ -53, +59 @@ } } }}} + = Here is an example of the results of such a view: = {{{ @@ -84, +91 @@ ] } }}} + = ''This example output was reformatted for readability.'' = Keep in mind that emit works by just storing the key/value pairs in an ar= ray and then, when all views in the same _design document have been calcula= ted, returns all results at once. So if you use an object to make calculati= ons and do multiple emits on the same document, you must create a copy and = not emit the same object multiple times. For example: @@ -100, +108 @@ } } }}} - ''Note: For those unfamiliar with the convention 'eval(uneval(_obj_))', t= his simply clones _obj_. It is cleaner than traversing each element of _obj= _ and it will always be true that uneval(eval(uneval(x))) =3D=3D uneval(x)'= and 'eval(uneval(x)) =3D=3D deep_copy_of_x' . The actual method 'uneval(_o= bj_)' is a Spidermonkey specific (as of 1.7) extension that is not part of = ECMAScript.''''' '''[[http://www.thespanner.co.uk/2008/04/10/javascript-clo= ning-objects/|1]]'' [[https://developer.mozilla.org/en/SpiderMonkey/JSAPI_R= eference/JS_InitStandardClasses|2]] [[http://www.mozilla.org/rhino/rhino15R= 5.html|3]]'''''''' = + = - '''<>'''''' ''' + <> - = =3D=3D=3D Reduce Functions =3D=3D=3D + = - '''Reduce is a powerful feature of CouchDB but is often misused which lea= ds to performance problems. From 0.10 onwards, CouchDB uses a heuristic to = detect reduce functions that won't scale to give the developer an early war= ning. A reduce function must reduce the input values to a smaller output va= lue. If you are building a composite return structure in your reduce, or on= ly transforming the values field, rather than summarizing it, you might be = misusing this feature. See [[#reduced_value_sizes|Reduced Value Sizes]] for= more details. ''' + Reduce is a powerful feature of CouchDB but is often misused which leads = to performance problems. From 0.10 onwards, CouchDB uses a heuristic to det= ect reduce functions that won't scale to give the developer an early warnin= g. A reduce function must reduce the input values to a smaller output value= . If you are building a composite return structure in your reduce, or only = transforming the values field, rather than summarizing it, you might be mis= using this feature. See [[#reduced_value_sizes|Reduced Value Sizes]] for mo= re details. = - '''If a view has a reduce function, it is used to produce aggregate resul= ts for that view. A reduce function is passed a set of intermediate values = and combines them to a single value. Reduce functions must accept, as input= , results emitted by its corresponding map function '''as well as results r= eturned by the reduce function itself'''. The latter case is referred to as= a ''rereduce''. ''' + If a view has a reduce function, it is used to produce aggregate results = for that view. A reduce function is passed a set of intermediate values and= combines them to a single value. Reduce functions must accept, as input, r= esults emitted by its corresponding map function '''as well as results retu= rned by the reduce function itself'''. The latter case is referred to as a = ''rereduce''. = - '''Here is an example of a reduce function: ''' + Here is an example of a reduce function: = {{{ + function (key, values, rereduce) { + return sum(values); + } }}} + = - '''Reduce functions are passed three arguments in the order ''key'', ''va= lues'' and ''rereduce'' ''' + Reduce functions are passed three arguments in the order ''key'', ''value= s'' and ''rereduce'' = - '''Reduce functions must handle two cases: ''' + Reduce functions must handle two cases: = - '''1. When ''rereduce'' is ''false'': ''' + 1. When ''rereduce'' is ''false'': - = - * '''''key'' will be an array whose elements are arrays of the form ''[k= ey,id]'', where ''key'' is a key emitted by the map function and ''id'' is = that of the document from which the key was generated. ''' + * ''key'' will be an array whose elements are arrays of the form ''[key,= id]'', where ''key'' is a key emitted by the map function and ''id'' is tha= t of the document from which the key was generated. = - * '''''values'' will be an array of the values emitted for the respectiv= e elements in ''keys'' ''' + * ''values'' will be an array of the values emitted for the respective e= lements in ''keys'' - * '''i.e. {{{reduce([ [key1,id1], [key2,id2], [key3,id3] ], [value1,valu= e2,value3], false)}}} ''' + * i.e. {{{reduce([ [key1,id1], [key2,id2], [key3,id3] ], [value1,value2,= value3], false)}}} = - '''2. When ''rereduce'' is ''true'': ''' + 2. When ''rereduce'' is ''true'': - = - * '''''key'' will be ''null'' ''' + * ''key'' will be ''null'' - * '''''values'' will be an array of values returned by previous calls to= the reduce function ''' + * ''values'' will be an array of values returned by previous calls to th= e reduce function - * '''i.e. {{{reduce(null, [intermediate1,intermediate2,intermediate3], t= rue)}}} ''' + * i.e. {{{reduce(null, [intermediate1,intermediate2,intermediate3], true= )}}} = - '''Reduce functions should return a single value, suitable for both the '= 'value'' field of the final view and as a member of the ''values'' array pa= ssed to the reduce function. ''' + Reduce functions should return a single value, suitable for both the ''va= lue'' field of the final view and as a member of the ''values'' array passe= d to the reduce function. = - '''Often, reduce functions can be written to handle rereduce calls withou= t any extra code, like the summation function above. In that case, the ''re= reduce'' argument can be ignored and in JavaScript, it can be omitted from = the function definition entirely. ''' + Often, reduce functions can be written to handle rereduce calls without a= ny extra code, like the summation function above. In that case, the ''rered= uce'' argument can be ignored and in JavaScript, it can be omitted from the= function definition entirely. = =3D=3D=3D Reduce vs rereduce =3D=3D=3D + = - '''On a large database objects to be reduced will be sent to your reduce = function in batches. These batches will be broken up on B-tree boundaries, = which may occur in arbitrary places. ''' + On a large database objects to be reduced will be sent to your reduce fun= ction in batches. These batches will be broken up on B-tree boundaries, whi= ch may occur in arbitrary places. = - '''[[http://mail-archives.apache.org/mod_mbox/couchdb-user/200903.mbox/<2= 0090330084727.GA7913@uk.tiscali.com>|For example]], suppose you have a view= which emits key->value pairs like this: ''' + [[http://mail-archives.apache.org/mod_mbox/couchdb-user/200903.mbox/<2009= 0330084727.GA7913@uk.tiscali.com>|For example]], suppose you have a view wh= ich emits key->value pairs like this: = {{{ + [X, Y, 0] -> Object_A + [X, Y, 1] -> Object_B1 + [X, Y, 1] -> Object_B1 + [X, Y, 1] -> Object_B1 + [Z, Q, 0] .... }}} + = - '''Your reduce function may receive ''' + Your reduce function may receive = {{{ + [Object_A, Object_B1] }}} + = - '''and then in a separate invocation ''' + and then in a separate invocation = {{{ + [Object_B1, Object_B1] }}} + = - '''The outputs of these two reduce functions will then be passed to your = reduce function again with rereduce=3Dtrue to make the final answer. You ca= nnot rely on all four rows being passed to the initial reduce function. ''' + The outputs of these two reduce functions will then be passed to your red= uce function again with rereduce=3Dtrue to make the final answer. You canno= t rely on all four rows being passed to the initial reduce function. = - '''Furthermore: due to reduce optimisations, you may only receive some of= the blocks to be reduced. Example: take these three Btree nodes: ''' + Furthermore: due to reduce optimisations, you may only receive some of th= e blocks to be reduced. Example: take these three Btree nodes: = {{{ + [a b c d e f g] [h i j k l m n] [o p q r s t u] + R1 R2 R3 }}} + = - '''The reduce value of all the items in each Btree node is stored within = each node, e.g. {{{[a b c d e f g]}}} reduces to {{{R1}}}. Now suppose some= one asks for a reduce value across a key range: ''' + The reduce value of all the items in each Btree node is stored within eac= h node, e.g. {{{[a b c d e f g]}}} reduces to {{{R1}}}. Now suppose someone= asks for a reduce value across a key range: = {{{ + key range + <-----------------------------> + [a b c d e f g] [h i j k l m n] [o p q r s t u] }}} + = - '''CouchDB will call your reduce function to calculate a value for {{{[e = f g]}}} and for {{{[o p q r]}}}, but will use the existing stored/calculate= d value of R2 across the middle block. ''' + CouchDB will call your reduce function to calculate a value for {{{[e f g= ]}}} and for {{{[o p q r]}}}, but will use the existing stored/calculated v= alue of R2 across the middle block. = - '''Therefore, it is wrong to attempt to maintain any sort of state in you= r reduce function between invocations. And because the Btree node boundarie= s can appear in any place, it is wrong to attempt to cross-reference adjace= nt documents too. Any cross-referencing needs to take place in the client, = not in a reduce function. ''' + Therefore, it is wrong to attempt to maintain any sort of state in your r= educe function between invocations. And because the Btree node boundaries c= an appear in any place, it is wrong to attempt to cross-reference adjacent = documents too. Any cross-referencing needs to take place in the client, not= in a reduce function. = =3D=3D=3D Access Strategy =3D=3D=3D + = - '''For queries which are not meant to actually condense the amount of inf= ormation you often can live without a reduce function. A common strategy is= to get the data you are interested to select by in into the ''key'' part a= nd then use ''startkey'' and ''endkey'' on the result. ''' + For queries which are not meant to actually condense the amount of inform= ation you often can live without a reduce function. A common strategy is to= get the data you are interested to select by in into the ''key'' part and = then use ''startkey'' and ''endkey'' on the result. = =3D=3D Keys and values =3D=3D + = =3D=3D=3D Equal keys =3D=3D=3D + = - '''CouchDB actually stores the [key,docid] pair as the key in the btree. = This means that: ''' + CouchDB actually stores the [key,docid] pair as the key in the btree. Thi= s means that: - = - * '''you always know which document the key and value came from (it's ex= posed as the 'id' field in the view result) ''' + * you always know which document the key and value came from (it's expos= ed as the 'id' field in the view result) - * '''view rows with equal keys sort by increasing docid. ''' + * view rows with equal keys sort by increasing docid. = - '''If you assign your docids in a time-based way then you can get documen= ts with equal keys in a natural oldest-to-newest order. Couchdb has a featu= re to do this, which you can enable in local.ini: ''' + If you assign your docids in a time-based way then you can get documents = with equal keys in a natural oldest-to-newest order. Couchdb has a feature = to do this, which you can enable in local.ini: = {{{ + [uuids] + algorithm =3D utc_random }}} + = =3D=3D=3D Lookup Views =3D=3D=3D + = - '''The second parameter of the ''emit()'' function can be ''null''. Couch= DB then only stores the [key,docid] in the view. You can use the view as a = compact lookup mechanism and fetch the document's details, if needed, in su= bsequent requests or by adding parameter ''include_docs=3Dtrue'' ''' + The second parameter of the ''emit()'' function can be ''null''. CouchDB = then only stores the [key,docid] in the view. You can use the view as a com= pact lookup mechanism and fetch the document's details, if needed, in subse= quent requests or by adding parameter ''include_docs=3Dtrue'' = =3D=3D=3D Linked documents =3D=3D=3D + = - '''''This is a new feature in couchdb trunk / 0.11'' ''' + ''This is a new feature in couchdb trunk / 0.11'' = - '''If you emit an object value which has '''{'_id': XXX}''' then include_= docs will fetch the document with id XXX rather than the document which was= processed to emit the key/value pair. ''' + If you emit an object value which has '''{'_id': XXX}''' then include_doc= s will fetch the document with id XXX rather than the document which was pr= ocessed to emit the key/value pair. = - '''This means that if one document contains the ids of other documents, i= t can cause those documents to be fetched in the view too, adjacent to the = same key if required. ''' + This means that if one document contains the ids of other documents, it c= an cause those documents to be fetched in the view too, adjacent to the sam= e key if required. = - '''For example, if you have the following hierarchically-linked documents= : ''' + For example, if you have the following hierarchically-linked documents: = {{{ + [ + { "_id": "11111" }, + { "_id": "22222", "ancestors": ["11111"], "value": "hello" }, + { "_id": "33333", "ancestors": ["22222","11111"], "value": "world" } + ] }}} + = - '''you can emit the values with the ancestor documents adjacent to them i= n the view like this: ''' + you can emit the values with the ancestor documents adjacent to them in t= he view like this: = {{{ + function(doc) { + if (doc.value) { + emit([doc.value, 0], null); + if (doc.ancestors) { + for (var i in doc.ancestors) { + emit([doc.value, Number(i)+1], {_id: doc.ancestors[i]}); + } + } + } + } }}} + = - '''The result you get is: ''' + The result you get is: = {{{ + {"total_rows":5,"offset":0,"rows":[ + {"id":"22222","key":["hello",0],"value":null, + "doc":{"_id":"22222","_rev":"1-0eee81fecb5aa4f51e285c621271ff02","ances= tors":["11111"],"value":"hello"}}, + {"id":"22222","key":["hello",1],"value":{"_id":"11111"}, + "doc":{"_id":"11111","_rev":"1-967a00dff5e02add41819138abb3284d"}}, + {"id":"33333","key":["world",0],"value":null, + "doc":{"_id":"33333","_rev":"1-11e42b44fdb3d3784602eca7c0332a43","ances= tors":["22222","11111"],"value":"world"}}, + {"id":"33333","key":["world",1],"value":{"_id":"22222"}, + "doc":{"_id":"22222","_rev":"1-0eee81fecb5aa4f51e285c621271ff02","ances= tors":["11111"],"value":"hello"}}, + {"id":"33333","key":["world",2],"value":{"_id":"11111"}, + "doc":{"_id":"11111","_rev":"1-967a00dff5e02add41819138abb3284d"}} + ]} }}} + = - '''which makes it very cheap to fetch a document plus all its ancestors i= n one query. ''' + which makes it very cheap to fetch a document plus all its ancestors in o= ne query. = - '''Note that the "id" in the row is still that of the originating documen= t. The only difference is that include_docs fetches a different doc. ''' + Note that the "id" in the row is still that of the originating document. = The only difference is that include_docs fetches a different doc. = =3D=3D=3D Complex Keys =3D=3D=3D + = - '''Keys are not limited to simple values. You can use arbitrary JSON valu= es to influence sorting. See ViewCollation for the rules. ''' + Keys are not limited to simple values. You can use arbitrary JSON values = to influence sorting. See ViewCollation for the rules. = - '''When the key is an array, view results can be grouped by a sub-section= of the key. For example, if keys have the form [''year'', ''month'', ''day= ''] then results can be reduced to a single value or by year, month, or day= . See HttpViewApi for more information. ''' + When the key is an array, view results can be grouped by a sub-section of= the key. For example, if keys have the form [''year'', ''month'', ''day'']= then results can be reduced to a single value or by year, month, or day. S= ee HttpViewApi for more information. = =3D=3D Views in Practice =3D=3D + = - '''See HttpViewApi to learn how to work with views. [[View_Snippets]] con= tain a few examples. ''' + See HttpViewApi to learn how to work with views. [[View_Snippets]] contai= n a few examples. = =3D=3D Grouping =3D=3D + = - '''The basic reduce operation with group=3Dfalse (the default over HTTP) = is to reduce to a single value. But by using startkey and endkey, you can g= et the summary value for any key interval. ''' + The basic reduce operation with group=3Dfalse (the default over HTTP) is = to reduce to a single value. But by using startkey and endkey, you can get = the summary value for any key interval. = - '''Using group=3Dtrue (which is Futon's default), you get a separate redu= ce value for each unique key in the map - that is, all values which share t= he same key are grouped together and reduced to a single value. ''' + Using group=3Dtrue (which is Futon's default), you get a separate reduce = value for each unique key in the map - that is, all values which share the = same key are grouped together and reduced to a single value. = - '''group_level=3DN queries are essentially a macro, which run one normal = (group=3Dfalse) reduce query automatically for each interval on a set of in= tervals as defined by the level. ''' + group_level=3DN queries are essentially a macro, which run one normal (gr= oup=3Dfalse) reduce query automatically for each interval on a set of inter= vals as defined by the level. = - '''So with group_level=3D1, and keys like ''' + So with group_level=3D1, and keys like - = {{{ + ["a",1,1] + ["a",3,4] + ["a",3,8] + ["b",2,6] + ["b",2,6] + ["c",1,5] + ["c",4,2] }}} - '''CouchDB will internally run 3 reduce queries for you. One that reduces= all rows where the first element of the key =3D "a", one for "b", and one = for "c". ''' + CouchDB will internally run 3 reduce queries for you. One that reduces + all rows where the first element of the key =3D "a", one for "b", and + one for "c". = - '''If you were to query with group_level=3D2, you'd get a reduce query ru= n for each unique set of keys (according to their first two elements), eg '= '' - = + If you were to query with group_level=3D2, you'd get a reduce query run + for each unique set of keys (according to their first two elements), + eg {{{ + ["a",1], ["a",3], ["b",2"], ["c",1], ["c",4] }}} - '''group=3Dtrue is the conceptual equivalent of group_level=3Dexact, so C= ouchDB runs a reduce per unique key in the map row set. ''' = - '''Note: map and reduce results are precomputed and stored in a btree. Ho= wever, the intermediate reduction values are cached according to the btree = structure, instead of according to the query params. So unless your range h= appens to match exactly the keys underneath a given inner node, you'll end = up running at least one javascript reduction per reduce query. A group=3Dtr= ue query effectively runs multiple reduce queries, so you may find it to be= slower than you expect. ''' + group=3Dtrue is the conceptual equivalent of group_level=3Dexact, so + CouchDB runs a reduce per unique key in the map row set. = + Note: map and reduce results are precomputed and stored in a btree. + However, the intermediate reduction values are cached + according to the btree structure, instead of according to the query + params. So unless your range happens to match exactly the keys + underneath a given inner node, you'll end up running at least one + javascript reduction per reduce query. A group=3Dtrue query effectively + runs multiple reduce queries, so you may find it to be slower than + you expect. + = - '''There is more detail in [[http://horicky.blogspot.com/2008/10/couchdb-= implementation.html|this blog posting]] under the heading "Query Processing= " ''' + There is more detail in [[http://horicky.blogspot.com/2008/10/couchdb-imp= lementation.html|this blog posting]] + under the heading "Query Processing" = =3D=3D Restrictions on map and reduce functions =3D=3D + = - '''The restriction on map functions is that they must be referentially tr= ansparent. That is, given the same input document, they will always emit th= e same key/value pairs. This allows CouchDB views to be updated incremental= ly, only reindexing the documents that have changed since the last index up= date. ''' + The restriction on map functions is that they must be referentially trans= parent. That is, given the same input document, they will always emit the s= ame key/value pairs. This allows CouchDB views to be updated incrementally,= only reindexing the documents that have changed since the last index updat= e. = - '''To make incremental Map/Reduce possible, the Reduce function has the r= equirement that not only must it be referentially transparent, but it must = also be commutative and associative for the array value input, to be able r= educe on its own output and get the same answer, like this: ''' + To make incremental Map/Reduce possible, the Reduce function has the requ= irement that not only must it be referentially transparent, but it must als= o be commutative and associative for the array value input, to be able redu= ce on its own output and get the same answer, like this: = {{{ + f(Key, Values) =3D=3D f(Key, [ f(Key, Values) ] ) }}} + = - '''This requirement of reduce functions allows CouchDB to store off inter= mediated reductions directly into inner nodes of btree indexes, and the vie= w index updates and retrievals will have logarithmic cost. It also allows t= he indexes to be spread across machines and reduced at query time with loga= rithmic cost. ''' + This requirement of reduce functions allows CouchDB to store off intermed= iated reductions directly into inner nodes of btree indexes, and the view i= ndex updates and retrievals will have logarithmic cost. It also allows the = indexes to be spread across machines and reduced at query time with logarit= hmic cost. = - '''For more details see [[http://damienkatz.net/2008/02/incremental_map.h= tml|this blog post]] ''' + For more details see [[http://damienkatz.net/2008/02/incremental_map.html= |this blog post]] = - '''Furthermore: you have no control over how documents are partitioned in= reduce and re-reduce phases. You cannot rely on "adjacent" documents all b= eing presented at the same time to the reduce function; it's quite possible= that one subset will be reduced to R1, another subset will be reduced to R= 2, and then R1 and R2 will be re-reduced together to make the final reduce = value. In the limiting case, consider that each of your documents might be = reduced individually and that the reduce outputs will be re-reduced togethe= r (which also may happen as a single stage or in multiple stages) ''' + Furthermore: you have no control over how documents are partitioned in re= duce and re-reduce phases. You cannot rely on "adjacent" documents all bein= g presented at the same time to the reduce function; it's quite possible th= at one subset will be reduced to R1, another subset will be reduced to R2, = and then R1 and R2 will be re-reduced together to make the final reduce val= ue. In the limiting case, consider that each of your documents might be red= uced individually and that the reduce outputs will be re-reduced together (= which also may happen as a single stage or in multiple stages) = - '''So you should also design your reduce/re-reduce function so that ''' + So you should also design your reduce/re-reduce function so that = {{{ + f(Key, Values) =3D=3D f(Key, [ f(Key, Value0), f(Key, Value1), f(Key, Val= ue2), ... ] ) }}} + = - '''<>'''''' ''' + <> - = =3D=3D=3D Reduced Value Sizes =3D=3D=3D + = - '''As CouchDB computes view indexes it also calculates the corresponding = reduce values and caches this value inside each of the btree node pointers.= This scheme allows CouchDB to reuse reduced values when updating the btree= . This scheme requires some care to be taken with the amount of data return= ed from reduce functions. ''' + As CouchDB computes view indexes it also calculates the corresponding red= uce values and caches this value inside each of the btree node pointers. Th= is scheme allows CouchDB to reuse reduced values when updating the btree. T= his scheme requires some care to be taken with the amount of data returned = from reduce functions. = - '''As a rule of thumb, the data returned by reduce functions should remai= n "smallish" and not grow faster than log(num_rows_processed). Although vio= lating this requirement will not cause an error, btree performance will deg= rade drastically. If you have a view that appears to work well on small dat= a sets but grinds to a halt as more data is added you're probably violating= the growth rate characteristics. ''' + As a rule of thumb, the data returned by reduce functions should remain "= smallish" and not grow faster than log(num_rows_processed). Although violat= ing this requirement will not cause an error, btree performance will degrad= e drastically. If you have a view that appears to work well on small data s= ets but grinds to a halt as more data is added you're probably violating th= e growth rate characteristics. = =3D=3D Interactive CouchDB Tutorial =3D=3D + = - '''See [[http://labs.mudynamics.com/2009/04/03/interactive-couchdb/|this = blog posting]] for an interactive tutorial (emulator in JavaScript) that ex= plains map/reduce, view collation and how to query CouchDB RESTfully. ''' + See [[http://labs.mudynamics.com/2009/04/03/interactive-couchdb/|this blo= g posting]] for an interactive tutorial (emulator in JavaScript) that expla= ins map/reduce, view collation and how to query CouchDB RESTfully. = =3D=3D Implementation =3D=3D + = - '''These blog posts include information about how map/reduce works, inclu= ding how reduce values are kept in btree nodes. ''' + These blog posts include information about how map/reduce works, includin= g how reduce values are kept in btree nodes. - = - * '''http://damienkatz.net/2008/02/incremental_map.html ''' + * [[http://damienkatz.net/2008/02/incremental_map.html]] - * '''http://damienkatz.net/2008/02/incremental_map_1.html ''' + * [[http://damienkatz.net/2008/02/incremental_map_1.html]] - * '''http://horicky.blogspot.com/2008/10/couchdb-implementation.html ''' + * [[http://horicky.blogspot.com/2008/10/couchdb-implementation.html]] =20