From commits-return-3746-apmail-couchdb-commits-archive=couchdb.apache.org@couchdb.apache.org Wed Jan 20 16:38:41 2010 Return-Path: Delivered-To: apmail-couchdb-commits-archive@www.apache.org Received: (qmail 58646 invoked from network); 20 Jan 2010 16:38:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Jan 2010 16:38:41 -0000 Received: (qmail 97521 invoked by uid 500); 20 Jan 2010 16:38:41 -0000 Delivered-To: apmail-couchdb-commits-archive@couchdb.apache.org Received: (qmail 97443 invoked by uid 500); 20 Jan 2010 16:38:41 -0000 Mailing-List: contact commits-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list commits@couchdb.apache.org Received: (qmail 97434 invoked by uid 500); 20 Jan 2010 16:38:40 -0000 Delivered-To: apmail-incubator-couchdb-commits@incubator.apache.org Received: (qmail 97431 invoked by uid 99); 20 Jan 2010 16:38:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2010 16:38:40 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jan 2010 16:38:29 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id CAEA616E0A; Wed, 20 Jan 2010 16:38:07 +0000 (GMT) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Wed, 20 Jan 2010 16:38:07 -0000 Message-ID: <20100120163807.5160.80035@eos.apache.org> Subject: =?utf-8?q?=5BCouchdb_Wiki=5D_Update_of_=22Introduction=5Fto=5FCouchDB=5Fv?= =?utf-8?q?iews=22_by_BrianChamberlain?= X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for c= hange notification. The "Introduction_to_CouchDB_views" page has been changed by BrianChamberla= in. http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views?action=3Ddiff&= rev1=3D27&rev2=3D28 -------------------------------------------------- A simple introduction to CouchDB views. = =3D=3D Concept =3D=3D - = Views are the primary tool used for querying and reporting on CouchDB doc= uments. There are two different kinds of views: permanent and temporary vie= ws. = '''Permanent views''' are stored inside special documents called design d= ocuments, and can be accessed via an HTTP ''GET'' request to the URI ''/{db= name}/{docid}/{viewname}'', where ''{docid}'' has the prefix ''_design/'' s= o that CouchDB recognizes the document as a design document, and ''{viewnam= e}'' has the prefix ''_view/'' so that CouchDB recognizes it as a view. @@ -16, +15 @@ = Note that by default views are not created and updated when a document is= saved, but rather, when it is accessed. As a result, the first access migh= t take some time depending on the size of your data while CouchDB creates t= he view. If preferable the views can also be updated when a document is sav= ed using an external script that calls the views when updates have been mad= e. An example can be found here: RegeneratingViewsOnUpdate = - Note that all views in a single design document get updated when one of t= he views in that design document gets queried. = + Note that all views in a single design document get updated when one of t= he views in that design document gets queried. = Note on !JavaScript API change: Prior to Tue, 20 May 2008 (Subversion rev= ision r658405) the function to emit a row to the map index, was named "map"= . It has now been changed to "emit". = - = =3D=3D Basics =3D=3D - = =3D=3D=3D Map Functions =3D=3D=3D - = Here is the simplest example of a map function: = {{{ @@ -32, +28 @@ emit(null, doc); } }}} - = This function defines a table that contains all the documents in a CouchD= B database, with no particular key. = A view function should accept a single argument: the document object. To = produce results, it should call the implicitly available ''emit(key, value)= '' function. For every invocation of that function, a result row is added t= o the view (if neither the ''key'' nor the ''value'' are undefined). As doc= uments are added, edited and deleted, the rows in the computed table are up= dated automatically. @@ -46, +41 @@ } } }}} - = For each document in the database that has a Type field with the value ''= customer'', a row is created in the view. The ''value'' column of the view = contains the ''!LastName'', ''!FirstName'', and ''Address'' fields for each= document. The key for all the documents is null in this case. = To be able to filter or sort the view by some document property, you woul= d use that property for the key. For example, the following view would allo= w you to lookup customer documents by the ''!LastName'' or ''!FirstName'' f= ields: @@ -59, +53 @@ } } }}} - = Here is an example of the results of such a view: = {{{ @@ -91, +84 @@ ] } }}} - = ''This example output was reformatted for readability.'' = Keep in mind that emit works by just storing the key/value pairs in an ar= ray and then, when all views in the same _design document have been calcula= ted, returns all results at once. So if you use an object to make calculati= ons and do multiple emits on the same document, you must create a copy and = not emit the same object multiple times. For example: @@ -108, +100 @@ } } }}} - = + ''Note: For those unfamiliar with the convention 'eval(uneval(_obj_))', t= his simply clones _obj_. It is cleaner than traversing each element of _obj= _ and it will always be true that uneval(eval(uneval(x))) =3D=3D uneval(x)'= and 'eval(uneval(x)) =3D=3D deep_copy_of_x' . The actual method 'uneval(_o= bj_)' is a Spidermonkey specific (as of 1.7) extension that is not part of = ECMAScript.''''' '''[[http://www.thespanner.co.uk/2008/04/10/javascript-clo= ning-objects/|[1]]]'' [[https://developer.mozilla.org/en/SpiderMonkey/JSAPI= _Reference/JS_InitStandardClasses|[2]]] [[http://www.mozilla.org/rhino/rhin= o15R5.html|[3]]]''''''' ''''' = <> + = =3D=3D=3D Reduce Functions =3D=3D=3D - = Reduce is a powerful feature of CouchDB but is often misused which leads = to performance problems. From 0.10 onwards, CouchDB uses a heuristic to det= ect reduce functions that won't scale to give the developer an early warnin= g. A reduce function must reduce the input values to a smaller output value= . If you are building a composite return structure in your reduce, or only = transforming the values field, rather than summarizing it, you might be mis= using this feature. See [[#reduced_value_sizes|Reduced Value Sizes]] for mo= re details. = If a view has a reduce function, it is used to produce aggregate results = for that view. A reduce function is passed a set of intermediate values and= combines them to a single value. Reduce functions must accept, as input, r= esults emitted by its corresponding map function '''as well as results retu= rned by the reduce function itself'''. The latter case is referred to as a = ''rereduce''. @@ -124, +116 @@ return sum(values); } }}} - = Reduce functions are passed three arguments in the order ''key'', ''value= s'' and ''rereduce'' = Reduce functions must handle two cases: = 1. When ''rereduce'' is ''false'': + = - * ''key'' will be an array whose elements are arrays of the form ''[key,= id]'', where ''key'' is a key emitted by the map function and ''id'' is tha= t of the document from which the key was generated. = + * ''key'' will be an array whose elements are arrays of the form ''[key,= id]'', where ''key'' is a key emitted by the map function and ''id'' is tha= t of the document from which the key was generated. * ''values'' will be an array of the values emitted for the respective e= lements in ''keys'' * i.e. {{{reduce([ [key1,id1], [key2,id2], [key3,id3] ], [value1,value2,= value3], false)}}} = 2. When ''rereduce'' is ''true'': + = * ''key'' will be ''null'' * ''values'' will be an array of values returned by previous calls to th= e reduce function * i.e. {{{reduce(null, [intermediate1,intermediate2,intermediate3], true= )}}} @@ -144, +137 @@ Often, reduce functions can be written to handle rereduce calls without a= ny extra code, like the summation function above. In that case, the ''rered= uce'' argument can be ignored and in JavaScript, it can be omitted from the= function definition entirely. = =3D=3D=3D Reduce vs rereduce =3D=3D=3D - = On a large database objects to be reduced will be sent to your reduce fun= ction in batches. These batches will be broken up on B-tree boundaries, whi= ch may occur in arbitrary places. = [[http://mail-archives.apache.org/mod_mbox/couchdb-user/200903.mbox/<2009= 0330084727.GA7913@uk.tiscali.com>|For example]], suppose you have a view wh= ich emits key->value pairs like this: @@ -156, +148 @@ [X, Y, 1] -> Object_B1 [Z, Q, 0] .... }}} - = Your reduce function may receive = {{{ [Object_A, Object_B1] }}} - = and then in a separate invocation = {{{ [Object_B1, Object_B1] }}} - = The outputs of these two reduce functions will then be passed to your red= uce function again with rereduce=3Dtrue to make the final answer. You canno= t rely on all four rows being passed to the initial reduce function. = Furthermore: due to reduce optimisations, you may only receive some of th= e blocks to be reduced. Example: take these three Btree nodes: @@ -177, +166 @@ [a b c d e f g] [h i j k l m n] [o p q r s t u] R1 R2 R3 }}} - = The reduce value of all the items in each Btree node is stored within eac= h node, e.g. {{{[a b c d e f g]}}} reduces to {{{R1}}}. Now suppose someone= asks for a reduce value across a key range: = {{{ @@ -185, +173 @@ <-----------------------------> [a b c d e f g] [h i j k l m n] [o p q r s t u] }}} - = CouchDB will call your reduce function to calculate a value for {{{[e f g= ]}}} and for {{{[o p q r]}}}, but will use the existing stored/calculated v= alue of R2 across the middle block. = Therefore, it is wrong to attempt to maintain any sort of state in your r= educe function between invocations. And because the Btree node boundaries c= an appear in any place, it is wrong to attempt to cross-reference adjacent = documents too. Any cross-referencing needs to take place in the client, not= in a reduce function. = =3D=3D=3D Access Strategy =3D=3D=3D - = For queries which are not meant to actually condense the amount of inform= ation you often can live without a reduce function. A common strategy is to= get the data you are interested to select by in into the ''key'' part and = then use ''startkey'' and ''endkey'' on the result. = =3D=3D Keys and values =3D=3D - = =3D=3D=3D Equal keys =3D=3D=3D - = CouchDB actually stores the [key,docid] pair as the key in the btree. Thi= s means that: + = * you always know which document the key and value came from (it's expos= ed as the 'id' field in the view result) * view rows with equal keys sort by increasing docid. = @@ -208, +193 @@ [uuids] algorithm =3D utc_random }}} - = =3D=3D=3D Lookup Views =3D=3D=3D - = The second parameter of the ''emit()'' function can be ''null''. CouchDB = then only stores the [key,docid] in the view. You can use the view as a com= pact lookup mechanism and fetch the document's details, if needed, in subse= quent requests or by adding parameter ''include_docs=3Dtrue'' = =3D=3D=3D Linked documents =3D=3D=3D - = ''This is a new feature in couchdb trunk / 0.11'' = If you emit an object value which has '''{'_id': XXX}''' then include_doc= s will fetch the document with id XXX rather than the document which was pr= ocessed to emit the key/value pair. @@ -230, +212 @@ { "_id": "33333", "ancestors": ["22222","11111"], "value": "world" } ] }}} - = you can emit the values with the ancestor documents adjacent to them in t= he view like this: = {{{ @@ -245, +226 @@ } } }}} - = The result you get is: = {{{ @@ -262, +242 @@ "doc":{"_id":"11111","_rev":"1-967a00dff5e02add41819138abb3284d"}} ]} }}} - = which makes it very cheap to fetch a document plus all its ancestors in o= ne query. = Note that the "id" in the row is still that of the originating document. = The only difference is that include_docs fetches a different doc. = =3D=3D=3D Complex Keys =3D=3D=3D - = Keys are not limited to simple values. You can use arbitrary JSON values = to influence sorting. See ViewCollation for the rules. = When the key is an array, view results can be grouped by a sub-section of= the key. For example, if keys have the form [''year'', ''month'', ''day'']= then results can be reduced to a single value or by year, month, or day. S= ee HttpViewApi for more information. = =3D=3D Views in Practice =3D=3D - = See HttpViewApi to learn how to work with views. [[View_Snippets]] contai= n a few examples. = =3D=3D Grouping =3D=3D - = The basic reduce operation with group=3Dfalse (the default over HTTP) is = to reduce to a single value. But by using startkey and endkey, you can get = the summary value for any key interval. = Using group=3Dtrue (which is Futon's default), you get a separate reduce = value for each unique key in the map - that is, all values which share the = same key are grouped together and reduced to a single value. @@ -286, +262 @@ group_level=3DN queries are essentially a macro, which run one normal (gr= oup=3Dfalse) reduce query automatically for each interval on a set of inter= vals as defined by the level. = So with group_level=3D1, and keys like + = {{{ ["a",1,1] ["a",3,4] @@ -295, +272 @@ ["c",1,5] ["c",4,2] }}} + CouchDB will internally run 3 reduce queries for you. One that reduces al= l rows where the first element of the key =3D "a", one for "b", and one for= "c". - CouchDB will internally run 3 reduce queries for you. One that reduces - all rows where the first element of the key =3D "a", one for "b", and - one for "c". = + If you were to query with group_level=3D2, you'd get a reduce query run f= or each unique set of keys (according to their first two elements), eg + = - If you were to query with group_level=3D2, you'd get a reduce query run - for each unique set of keys (according to their first two elements), - eg {{{ ["a",1], ["a",3], ["b",2"], ["c",1], ["c",4] }}} + group=3Dtrue is the conceptual equivalent of group_level=3Dexact, so Couc= hDB runs a reduce per unique key in the map row set. = + Note: map and reduce results are precomputed and stored in a btree. Howev= er, the intermediate reduction values are cached according to the btree str= ucture, instead of according to the query params. So unless your range happ= ens to match exactly the keys underneath a given inner node, you'll end up = running at least one javascript reduction per reduce query. A group=3Dtrue = query effectively runs multiple reduce queries, so you may find it to be sl= ower than you expect. - group=3Dtrue is the conceptual equivalent of group_level=3Dexact, so - CouchDB runs a reduce per unique key in the map row set. = - Note: map and reduce results are precomputed and stored in a btree. - However, the intermediate reduction values are cached - according to the btree structure, instead of according to the query - params. So unless your range happens to match exactly the keys - underneath a given inner node, you'll end up running at least one - javascript reduction per reduce query. A group=3Dtrue query effectively - runs multiple reduce queries, so you may find it to be slower than - you expect. - = - There is more detail in [[http://horicky.blogspot.com/2008/10/couchdb-imp= lementation.html|this blog posting]] + There is more detail in [[http://horicky.blogspot.com/2008/10/couchdb-imp= lementation.html|this blog posting]] under the heading "Query Processing" - under the heading "Query Processing" = =3D=3D Restrictions on map and reduce functions =3D=3D - = The restriction on map functions is that they must be referentially trans= parent. That is, given the same input document, they will always emit the s= ame key/value pairs. This allows CouchDB views to be updated incrementally,= only reindexing the documents that have changed since the last index updat= e. = To make incremental Map/Reduce possible, the Reduce function has the requ= irement that not only must it be referentially transparent, but it must als= o be commutative and associative for the array value input, to be able redu= ce on its own output and get the same answer, like this: @@ -330, +293 @@ {{{ f(Key, Values) =3D=3D f(Key, [ f(Key, Values) ] ) }}} - = This requirement of reduce functions allows CouchDB to store off intermed= iated reductions directly into inner nodes of btree indexes, and the view i= ndex updates and retrievals will have logarithmic cost. It also allows the = indexes to be spread across machines and reduced at query time with logarit= hmic cost. = For more details see [[http://damienkatz.net/2008/02/incremental_map.html= |this blog post]] @@ -342, +304 @@ {{{ f(Key, Values) =3D=3D f(Key, [ f(Key, Value0), f(Key, Value1), f(Key, Val= ue2), ... ] ) }}} - = <> + = =3D=3D=3D Reduced Value Sizes =3D=3D=3D - = As CouchDB computes view indexes it also calculates the corresponding red= uce values and caches this value inside each of the btree node pointers. Th= is scheme allows CouchDB to reuse reduced values when updating the btree. T= his scheme requires some care to be taken with the amount of data returned = from reduce functions. = As a rule of thumb, the data returned by reduce functions should remain "= smallish" and not grow faster than log(num_rows_processed). Although violat= ing this requirement will not cause an error, btree performance will degrad= e drastically. If you have a view that appears to work well on small data s= ets but grinds to a halt as more data is added you're probably violating th= e growth rate characteristics. = =3D=3D Interactive CouchDB Tutorial =3D=3D - = See [[http://labs.mudynamics.com/2009/04/03/interactive-couchdb/|this blo= g posting]] for an interactive tutorial (emulator in JavaScript) that expla= ins map/reduce, view collation and how to query CouchDB RESTfully. = =3D=3D Implementation =3D=3D - = These blog posts include information about how map/reduce works, includin= g how reduce values are kept in btree nodes. + = - * [[http://damienkatz.net/2008/02/incremental_map.html]] + * http://damienkatz.net/2008/02/incremental_map.html - * [[http://damienkatz.net/2008/02/incremental_map_1.html]] + * http://damienkatz.net/2008/02/incremental_map_1.html - * [[http://horicky.blogspot.com/2008/10/couchdb-implementation.html]] + * http://horicky.blogspot.com/2008/10/couchdb-implementation.html =20