couchdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Couchdb Wiki] Update of "Partial_Updates" by MarkHahn
Date Wed, 27 Mar 2013 01:03:35 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The "Partial_Updates" page has been changed by MarkHahn:
http://wiki.apache.org/couchdb/Partial_Updates?action=diff&rev1=6&rev2=7

- This is a work in progress.  I accidentally released it but I will now switch to editing
it offline.
- 
  <<Include(EditTheWiki)>>
  
  <<TableOfContents(3)>>
  
- Future versions of Couch DB are expected to have a built-in partial update feature. However,
partial updates can be accomplished now with current versions of Couch using the existing
update handler feature. While one may write their own update handler for this purpose, an
example is given here that anyone can use.
+ Future versions of CouchDB are expected to have a built-in partial update feature. However,
partial updates can be accomplished now with current versions of CouchDB using the existing
update handler feature. While one may write their own update handler for this purpose, an
example is given here that anyone can use.
  
  == What is a partial update? ==
  
- A partial update is a single HTTP request to Couch that is similar to a normal update (PUT).
 However the partial update request contains only information for updating (or deleting) one
or more fields (or sub-fields) of a doc. 
+ A partial update is a single HTTP request to CouchDB that is similar to a normal update
(PUT).  However the partial update request contains only information for updating (or deleting)
one or more fields (or sub-fields) of a doc. 
  
- === Why is a partial update useful? ===
+ == Why is a partial update useful? ==
  
- A partial update is more efficient than a normal full update.  Only the change information
needs to be sent over HTTP, not the entire doc. In general, changing of a single field in
a doc requires reading the doc, changing it, and then putting the doc back to the DB.  A partial
update only needs the ID of the doc in order to make the field change.
+  * A partial update is more efficient than a normal full update.  Only the change information
needs to be sent over HTTP, not the entire doc. In the general case of a full update, changing
a single field in a doc requires reading the doc, changing it, and then putting the doc back
to the DB.  A partial update only needs the ID of the doc in order to make the field change.
  
- Also, partial updates allow the code in an app, or multiple apps, to be partitioned into
multiple pieces where each piece of code only knows about one part of the DOC.  In many cases
this allows for a better separation of concerns.
+  * Partial updates allow the code in an app, or multiple apps, to be partitioned into multiple
pieces where each piece of code only knows about one part of the doc.  In many cases this
allows for a better separation of concerns. As an example, a routine may called with only
the doc ID and then the routine may at any time update part of a doc without ever having a
full copy of the doc.  This is especially important when accessing a single DB from multiple
apps (or workers) where a single copy of a doc can't be shared.
  
- As an example, a routine may called with only the doc ID and then the routine may at any
time update part of a doc without ever having a full copy of the doc.  This is especially
important when accessing a single DB from multiple apps (or workers) where a single copy of
a DOC can't be shared.
+  * A partial update, like any other update using the update handler, has less chance of
a 409 collision than the sequence of getting a doc, modifying it, and then putting it back.
 This is because the internal update is faster than getting and putting over a TCP link. 
Also, when different non-overlapping parts of a doc are being updated at once, collisions
can usually be ignored.
+ 
+ 
+ == Example update handler ==
+ 
+ While this is just an example, it can be used by anyone for accomplishing any partial update.
 The author (I, Mark Hahn) am making this code available to everyone under the standard Apache
2 license.
+ 
+ 
+ === Coffescript version ===
+ 
+ {{{
+   partialUpdate: (doc, req) ->
+     if not doc then return [null, JSON.stringify status: 'nodoc']
+     for k, v of JSON.parse req.body
+       if k[0] is '/'
+         nestedDoc = doc
+         nestedKeys = k.split '/'
+         for nestedKey in nestedKeys[1..-2]
+           nestedDoc = (nestedDoc[nestedKey] ?= {})
+         k = nestedKeys[-1..-1][0]
+         if v is '__delete__' then delete nestedDoc[k]
+         else nestedDoc[k] = v
+         continue
+       if v is '__delete__' then delete doc[k]
+       else doc[k] = v
+     [doc, JSON.stringify {doc, status: 'updated'}]
+ }}}
+ 
+ === Javascript Version ===
+ 
+ {{{
+ partialUpdate: function(doc, req) {
+   if (!doc) {
+     return [
+       null, JSON.stringify({
+         status: 'nodoc'
+       })
+     ];
+   }
+   _ref = JSON.parse(req.body);
+   for (k in _ref) {
+     v = _ref[k];
+     if (k[0] === '/') {
+       nestedDoc = doc;
+       nestedKeys = k.split('/');
+       _ref1 = nestedKeys.slice(1, -1);
+       for (_i = 0, _len = _ref1.length; _i < _len; _i++) {
+         nestedKey = _ref1[_i];
+         nestedDoc = ((_ref2 = nestedDoc[nestedKey]) != null ? _ref2 : nestedDoc[nestedKey]
= {});
+       }
+       k = nestedKeys.slice(-1)[0];
+       if (v === '__delete__') {
+         delete nestedDoc[k];
+       } else {
+         nestedDoc[k] = v;
+       }
+       continue;
+     }
+     if (v === '__delete__') {
+       delete doc[k];
+     } else {
+       doc[k] = v;
+     }
+   }
+   return [
+     doc, JSON.stringify({
+       doc: doc,
+       status: 'updated'
+     })
+   ];
+ }
+ }}}
+ 
+ 
+ == Installing the update handler ==
+ 
+ To install the update handler, add the Javascript version of code above to the ''updates''
property of a design doc.
+ 
+ See [[Document_Update_Handlers]] for general information about update handlers.  
  
  
  
+ == Usage instructions ==
  
+ To quote the instructions at [[Document_Update_Handlers]] ...
- {{{
- <!DOCTYPE html>
  
+   To invoke a handler, use a PUT request against the handler function with a document id:
`/<database>/_design/<design>/_update/<function>/<docid>`
- <html lang="en">
- <head>
-   <meta charset="utf-8">
-   <title>Minimal Form</title>
- </head>
  
+ The update document should be contained in the HTTP request body in JSON format.  When using
the partial update handler listed above, the update document must use a special format. The
JSON doc should consist of one hash object where each property of the object is one ''update
command''.
- <body>
-   <div id="contact-form">
-     <form id="contact" method="post" action="/db/_design/ddoc/_update/simpleform">
-       <fieldset>
-         <label for="name">name</label>    <input type="text" name="name"
placeholder="Full Name" title="Enter your name" class="required">
-         <label for="phone">phone</label>  <input type="tel" name="phone"
placeholder="+1 (555) 555-5555" required="" pattern="\+?[0-9 )(-]+">
-         <label for="email">e-mail</label> <input type="email" name="email"
placeholder="you@example.org" title="e-mail address" class="required email">
-         <label for="blog">blog</label>    <input type="url" name="url" placeholder="http://">
-         <label for="message">message</label>
-         <textarea name="message"></textarea>
-         <input type="submit" name="submit" class="button" id="submit" value="submit">
-       </fieldset>
-     </form>
-   </div>
- </body>
- </html>
- }}}
  
- The most important part of the above form is the {{{action="/simpleform/_design/simpleform/_update/simpleform"}}}
which specifies the update handler that will receive the POSTed data.
+ The property key of an update command specifies which field is to be updated.  It can be
a simple, top-level, property key or it can be a ''path'' into an object with nested objects
or arrays.  A ''path'' key is indicated by a leading slash `/` and multiple parts separated
by slashes.  
  
+ Consider this original doc ...
- It's broken down into 5 key sections:
- 
-  * the Database {{{db}}}
-  * the id of the design doc {{{_design/simpleform}}} itself
-  * {{{_update}}} informs CouchDB that this is an update handler and specifies the key within
the ddoc that has our handler function
-  * the final {{{simpleform}}} specifies the update handler name within that ddoc, that will
receive the POSTed data
- 
- === Submitting the form from the terminal ===
- 
- Likely you'll be fiddling with your form quite a bit while working on the update handler.
In this case it makes a lot of sense simply to drive the form directly from the command line.
There is more information at [[Commandline_CouchDB]], including Windows tips.
- 
- {{{
- curl -vX POST http://localhost:5984/simpleform/_design/simpleform/_update/simpleform \
-     --header Content-Type:application/x-www-form-urlencoded \
-     --data-urlencode name="John Doe" \
-     --data-urlencode email="john@example.org" \
-     --data-urlencode phone="+1 (234) 567-890" \
-     --data-urlencode url="http://example.org/blog" \
-     --data-urlencode message="Y U NO HAZ CHEESBURGER" \
-     --data-urlencode submit="submit"
- }}}
- 
- If you are on a unix-like system, you may enjoy the colour output afforded by [[http://httpie.org/|httpie]],
a python-based curl replacement:
- 
- {{{
- http --pretty --verbose --style fruity --form \
-     post http://localhost:5984/simpleform/_design/simpleform/_update/simpleform  \
-     name="John Doe" \
-     email="john@example.org" \
-     phone="+1 (234) 567-890" \
-     url="http://example.org/blog" \
-     message="Y U NO HAZ CHEESBURGER" \
-     submit="submit"
- }}}
- 
- === A basic update handler ===
- 
- Here's a simple update handler that will receive the POSTed data as second parameter, and
the previous document version if any as the first parameter . In our case, using POST, there
will be no existing document so this will always be {{{null}}}. Finally this function, to
help us debug the handler, conveniently returns the output of the new document, along with
the request and previous doc if any. Obviously this could be HTML or a redirect to another
page using custom headers, you will need to customise this to fit.
- 
- {{{
- function(previous, request) {
- 
-     /* during development and testing you can write data to couch.log
-      log({"previous": previous})
-      log({"request": request})
-     */
- 
-     var doc = {}
- 
-     if (!previous) {
-         // there's no existing document _id as we are not using PUT
-         // let's use the email address as the _id instead
-         if (request.form && request.form.email) {
-             // Extract the JSON-parsed form from the request
-             // and add in the user's email as docid
-            doc     = request.form
-            doc._id = request.form.email
-         }
-     }
-     return [doc, toJSON({"request": request, "previous": previous, "doc": doc})]
- }
- }}}
- 
- === Tips and Tricks ===
- 
- There are a few points to cover here:
- 
-  * you can use {{{log(…)}}} to write data to your couch.log file
-  * Note that there's only ever going to be additional data in the previous document if we
use a PUT request and provide a URL that includes the document {{{_id}}}. The POST approach
doesn't pass a new {{{_id}}} in so in our example this will be blank. However the same update
handler can be used to service multiple forms and HTTP verbs.
-  * You must guard all tests {{{if (request.form && …}}} otherwise an exception
will occur if a field is missing, and your document will not be written.
-  * The returned {{{request}}} object also conveniently includes a valid CouchDB {{{UUID}}}
if you do not generate one of your own.
-  * When the function returns, if {{{doc}}} is empty then no data is written to CouchDB.
-  * The update handler can return almost anything, including custom headers and body. See
[[Document_Update_Handlers]] for more information.
- 
- 
- === Results from the form ===
- 
- After filling out the form and POSTing it back, you'll receive the results from {{{toJSON}}}
in your browser. You can use firebug or chrome developer tools to view the resulting text
in a pretty JSON format, or copy and paste it into a terminal and use any of the JSON prettifiers
out there, such as [[http://lloyd.github.com/yajl/|yajl]], which also has a {{{json_reformat}}}
command distributed with it.
- 
- Let's take a look in more detail over the three sections returned. The first section {{{request.info}}}
is simply the current DB information, identical to {{{GET $COUCH/db_name}}}.
  
  {{{
  {
+   _id: "xxx"
+   _rev: "1_something"
+   field_one: "zzz"
+   field_two: "Don't bother me."
+   topLevelObject: {
+     nestedField_one: "I'm doomed"
+     nestedField_two: 42
-   request: {
-     info: {
-       db_name: "simpleform",
-       doc_count: 3,
-       doc_del_count: 0,
-       update_seq: 32,
-       purge_seq: 0,
-       compact_running: false,
-       disk_size: 340069,
-       data_size: 158491,
-       instance_start_time: "1340837365780629",
-       disk_format_version: 6,
-       committed_update_seq: 32
-     },
- …
- }}}
- 
-  * Next comes the {{{_id}}} of the previous document version, for example if we were doing
a PUT request, this would be filled, along with a {{{_rev}}} revision as well.
-  * The requested path is provided in several forms, to make it easier to match update handlers
with document rewrite rules.
-  * If any {{{query}}} parameters were passsed to the URL, they would also be accessible.
 * The full headers are available as usual.
- 
- {{{
- …
-     id: null,
-     uuid: "8363428f19b4bc21217044e2b30133ad",
-     method: "POST",
-     requested_path: ["simpleform", "_design", "simpleform", "_update", "simpleform"],
-     path: ["simpleform", "_design", "simpleform", "_update", "simpleform"],
-     raw_path: "/simpleform/_design/simpleform/_update/simpleform",
-     query: {},
-     headers: {
-       Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
-       Accept - Charset: "ISO-8859-1,utf-8;q=0.7,*;q=0.3",
-       Accept - Encoding: "gzip,deflate,sdch",
-       Accept - Language: "en-US,en;q=0.8",
-       Cache - Control: "max-age=0",
-       Connection: "keep-alive",
-       Content - Length: "150",
-       Content - Type: "application/x-www-form-urlencoded",
-       Host: "localhost:5984",
-       Origin: "http://localhost:5984",
-       Referer: "http://localhost:5984/simpleform/_design/simpleform/minimalform.html",
-       User - Agent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1 (KHTML,
like Gecko) Chrome/22.0.1188.0 Safari/537.1"
-     },
- …
- }}}
- 
-   * The original HTTP {{{body}}} is provided, as well as the originating {{{peer}}} IP address.
-   * The url-encoded form parameters have conveniently been extracted and unencoded into
a JS object, which was stringified in our final {{{return}}}. This {{{form}}} is typically
used or transformed in some way to build up the resulting document object inside the update
handler function.
- 
- {{{
- …
-     body: "name=John+Doe&phone=%2B1+%28234%29+987-654&email=john%40example.org&url=http%3A%2F%2Fjohn.blogger.com%2F&message=STILL+NO+CHEEZBURGER%3F&submit=submit",
-     peer: "127.0.0.1",
-     form: {
-       name: "John Doe",
-       phone: "+1 (234) 987-654",
-       email: "john@example.org",
-       url: "http://john.blogger.com/",
-       message: "STILL NO CHEEZBURGER?",
-       submit: "submit",
-       _id: "john@example.org"
-     },
-  …
- }}}
-   * {{{cookie}}} and {{{user context}}} are also available here, enabling such things are
inserting usernames, or checking roles before further processing. Ensure that you are not
duplicating functionality that should be in a [[Document_Update_Validation]] function.
-   * the previous document is empty as we are doing a POST request.
- {{{
- …
-     cookie: {},
-     userCtx: {
-       db: "simpleform",
-       name: null,
-       roles: []
-     },
-     secObj: {}
-   },
-   previous: null,
- …
- }}}
- 
- === Emitting the result during testing ===
- 
- Finally we emit the resulting document, after our server-side update handler has run. Note
that the revision {{{_rev}}} is not yet available, but as noted in [[Document_Update_Handlers]]
it is possible to retrieve this.
- 
- {{{
- …
-   doc: {
-     name: "John Doe",
-     phone: "+1 (234) 987-654",
-     email: "john@example.org",
-     url: "http://john.blogger.com/",
-     message: "STILL NO CHEEZBURGER?",
-     submit: "submit",
-     _id: "john@example.org"
    }
  }
  }}}
  
- == Retrieve the Document ==
+ A command key of `field_one` is a command to replace "zzz" with the value of the update
command property.  A command key of `/topLevelObject/nestedField_two` will replace 42 with
the associate property value.
  
- After the write was successful we can retrieve the new document:
+ An update command that has the magic property value of `__delete__` will cause the corresponding
original field to be deleted. The command property `/topLevelObject/nestedField_one: "__delete__"`
will delete the entire nestedField_one property.
+ 
+ The update handler will automatically create any objects missing from any part of a path.
For example, the following update command `/topLevelObject/nestedField_three/dblNest/: 73`
will add the missing objects ''nestedField_three'' and ''dblNest'' to the doc.
+ 
+ This update document ...
  
  {{{
- $ curl --silent -HContent-Type:application/json  -vXGET http://localhost:5984/simpleform/john@example.org
| json_reformat
- 
- * About to connect() to localhost port 5984 (#0)
- *   Trying ::1... Connection refused
- *   Trying 127.0.0.1... connected
- * Connected to localhost (127.0.0.1) port 5984 (#0)
- > GET /simpleform/john@example.org HTTP/1.1
- > User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r
zlib/1.2.5
- > Host: localhost:5984
- > Accept: */*
- > Content-Type:application/json
- >
- < HTTP/1.1 200 OK
- < Server: CouchDB/1.3.0a- (Erlang OTP/R15B01)
- < ETag: "1-5c316da64caebbebcd0f87364df2a0e7"
- < Date: Thu, 28 Jun 2012 11:31:49 GMT
- < Content-Type: text/plain; charset=utf-8
- < Content-Length: 228
- < Cache-Control: must-revalidate
- <
- { [data not shown]
- * Connection #0 to host localhost left intact
- * Closing connection #0
  {
+   field_one: "AAA",
+   "/topLevelObject/nestedField_one": "__delete__",
+   "/topLevelObject/nestedField_two": 99,
+   "/topLevelObject/nestedField_three/dblNest/": 73
-     "_id": "john@example.org",
-     "_rev": "1-5c316da64caebbebcd0f87364df2a0e7",
-     "name": "John Doe",
-     "phone": "+1 (234) 987-654",
-     "email": "john@example.org",
-     "url": "http://john.blogger.com/",
-     "message": "STILL NO CHEEZBURGER?",
-     "submit": "submit"
  }
  }}}
  
- With the basic update handler above you should have no trouble modifying it further to perform
redirects, extract/merge or modify the data available to you - new and old document versions,
and user/security context as well. Don't forget that some code is better in a [[Document_Update_Validation]]
rather than in the update handler. The update handler will always act as another HTTP POST/PUT,
just run conveniently inside the server. They can still suffer from document conflicts, for
example.
+ will change the original document to ...
  
+ {{{
+ {
+   _id: "xxx"
+   _rev: "2_somethingElse"
+   field_one: "AAA"
+   field_two: "Don't bother me."
+   topLevelObject: {
+     nestedField_two: 99
+     nestedField_three: {
+       dblNest: 73
+     }
+   }
+ }
+ }}}
+ 
+ == Reserved syntax  ==
+ 
+ Note from the usage instructions above that if you use a doc key that starts with a slash
or a doc value that is `__delete__` you will have a problem using the update handler given
above.  This could be solved by adding some escape mechanism to the handler.  Fixing this
is left to the reader.
+ 
+ 
+ == HTTP 409 error ==
+ 
+ Even though an update with an update handler has less chance of colliding, it is still possible
for the the update request to return an HTTP error 409.  This is caused by some other update
incrementing the version of the doc while the update handler is executing.  You will need
to implement a retry mechanism and/or conflict resolution in the code making the HTTP request.
+ 

Mime
View raw message