incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject Some CouchDB Qs & As
Date Sun, 25 Jan 2009 12:09:50 GMT
As a newbie, I had some questions which I still had after reading the wiki
and the book so far, so I thought I'd do some experiments to find the
answers.

I'm just posting them here in case it's useful to anyone else, or it could
provide some ideas when writing the book.

(1a) Does CouchDB store the raw JSON which it receives, character by
character, or does it convert to and from an internal representation?
(1b) Does CouchDB accept valid Javascript which is not valid as JSON, e.g.
{foo:"bar"} or {foo:/bar/} ?

Let's test 1b first:

$ cat test.dat
{"foo":/bar/}
$ curl -T test.dat http://127.0.0.1:5984/test_suite_db/test1
{"error":"case_clause","reason":"{\"foo\":/bar/}\n"}

So the answer is "no", only strict JSON is allowed. Now to try 1a:

$ cat test.dat
{"foo":
       "bar"}
$ curl -T test.dat http://127.0.0.1:5984/test_suite_db/test1
{"ok":true,"id":"test1","rev":"1878284436"}
$ curl http://127.0.0.1:5984/test_suite_db/test1
{"_id":"test1","_rev":"1878284436","foo":"bar"}

This suggests that the JSON is converted into some internal representation,
and then converted back to JSON.

(2a) When PUTting an identical version of a document, does CouchDB still
allocate a new _rev?
(2b) What about when PUTting a document which is semantically identical
JSON, but differs in ordering of object members?

Let's try 2a first:

$ cat test2.dat
{
  "foo": "value1",
  "bar": "value2"
}
$ curl -T test2.dat http://127.0.0.1:5984/test_suite_db/test2
{"ok":true,"id":"test2","rev":"4264834066"}
$ curl http://127.0.0.1:5984/test_suite_db/test2 >test2a
$ cat test2a
{"_id":"test2","_rev":"4264834066","foo":"value1","bar":"value2"}
$ curl -T test2a http://127.0.0.1:5984/test_suite_db/test2
{"ok":true,"id":"test2","rev":"396680012"}
$ curl http://127.0.0.1:5984/test_suite_db/test2 >test2b
$ cat test2b
{"_id":"test2","_rev":"396680012","foo":"value1","bar":"value2"}

So it seems the answer is: a new _rev is allocated for any PUT, even if the
uploaded data is identical JSON. Hence 2b is irrelevant. It is up to the
client only to determine whether the data has changed or not, before
invoking a PUT operation.

(3) It is documented (but not stressed) that a document is a JSON object, as
opposed to any JSON value, but I thought I'd check that too:

$ cat test3.dat
["wibble","bibble"]
$ curl -T test3.dat http://127.0.0.1:5984/test_suite_db/test3
{"error":"error","reason":"function_clause"}

(4) If a document has attachments, what happens if you upload a new version
without the _attachments member or not listing a particular attachment? Is
it treated as an error, or ignored, or are the missing attachment(s) removed?

$ cat test4.dat
{
  "_attachments":
  {
    "foo.txt":
    {
      "content_type":"text\/plain",
      "data":"VGhpcyBpcyBmb28="
    },
    "bar.txt":
    {
      "content_type":"text\/plain",
      "data":"YW5kIHRoaXMgaXMgYmFy"
    }
  }
}
$ curl -T test4.dat http://127.0.0.1:5984/test_suite_db/test4
{"ok":true,"id":"test4","rev":"2042916403"}
$ curl http://127.0.0.1:5984/test_suite_db/test4/foo.txt
This is foo
$ curl http://127.0.0.1:5984/test_suite_db/test4/bar.txt
and this is bar
$ curl http://127.0.0.1:5984/test_suite_db/test4 >test4a
$ cat test4a
{"_id":"test4","_rev":"2042916403","_attachments":{"foo.txt":{"stub":true,"content_type":"text/plain","length":11},"bar.txt":{"stub":true,"content_type":"text/plain","length":15}}}
$ cp test4a test4b
$ vi test4b
... remove bar.txt attachment ...
$ cat test4b
{"_id":"test4","_rev":"2042916403","_attachments":{"foo.txt":{"stub":true,"content_type":"text/plain","length":11}}}
$ curl -T test4b http://127.0.0.1:5984/test_suite_db/test4
{"ok":true,"id":"test4","rev":"652385907"}
$ curl http://127.0.0.1:5984/test_suite_db/test4
{"_id":"test4","_rev":"652385907","_attachments":{"foo.txt":{"stub":true,"content_type":"text/plain","length":11}}}
$ curl http://127.0.0.1:5984/test_suite_db/test4/foo.txt
This is foo
$ curl http://127.0.0.1:5984/test_suite_db/test4/bar.txt
{"error":"not_found","reason":"Document is missing attachment"}

So this suggests that omitting the _attachment entry when PUTing will
delete the attachment.

OK, what about if you submit with a new content_type and/or length?

$ cat test4d
{"_id":"test4","_rev":"652385907","_attachments":{"foo.txt":{"stub":true,"content_type":"text/html","length":4}}}
$ curl -T test4d http://127.0.0.1:5984/test_suite_db/test4
{"ok":true,"id":"test4","rev":"3267807076"} 
$ curl http://127.0.0.1:5984/test_suite_db/test4
{"_id":"test4","_rev":"3267807076","_attachments":{"foo.txt":{"stub":true,"content_type":"text/plain","length":11}}}
$ curl http://127.0.0.1:5984/test_suite_db/test4/foo.txt
This is foo

So it looks like the attributes of the attachment are ignored. (This begs
the question: is it possible to change the content_type of an attachment
without re-uploading it? But that's probably not very useful anyway)

(5) How do the document's _id attribute and the id given in the URL
interact? Specifically:
(5a) If I PUT a new document to /db/id1 but the document contains
"_id":"id2", which wins?

$ cat test5a.dat
{"_id":"abc123","foo":"bar"}
$ curl -T test5a.dat http://127.0.0.1:5984/test_suite_db/test5
{"ok":true,"id":"test5","rev":"1818553963"}

Answer: The _id attribute is ignored, and the URL wins

(5b) What about for an existing document?

$ cat test5b
{"_id":"test1","_rev":"1818553963","foo":"bar"}
$ curl -T test5b http://127.0.0.1:5984/test_suite_db/test5
{"ok":true,"id":"test5","rev":"4125462791"}

Answer: The _id attribute is ignored, and the URL wins. (But clearly the
_rev is taken from the document itself)

(5c) If I POST to /db but the document contains "_id":"id3", is a random
document id still assigned?

$ curl -d '{"_id":"xxxyyy", "foo":"bar"}' http://127.0.0.1:5984/test_suite_db
{"ok":true,"id":"6aca52f5b234ff61bc32318cb0ea2f84","rev":"1457889014"}

Answer: Yes, the document's _id attribute is ignored.

(5d) What about _rev?

For PUT:
$ curl http://127.0.0.1:5984/test_suite_db/test5 >test5c
$ curl -T test5c http://127.0.0.1:5984/test_suite_db/test5copy
{"error":"conflict","reason":"Document update conflict."}

Answer: The presence of _rev indicates whether the document already exists
or not, so whilst _id is ignored, _rev must be removed if you are going to
make a copy of an existing document.

For POST:
$ curl http://127.0.0.1:5984/test_suite_db/test5 |
  curl -X POST -T - http://127.0.0.1:5984/test_suite_db
{"ok":true,"id":"c043ec883ee0926cc344b38e9cf00db9","rev":"1677811569"}

Answer: for POST, both _id and _rev are ignored.

Aside: I see at http://wiki.apache.org/couchdb/HTTP_Document_API

  "A CouchDB document is simply a JSON object ... The document can be an
  arbitrary JSON object"

which technically answers (1b) and (3). However in the book I suggest it may
be worth discussing exactly what is a valid CouchDB document, and what is
not.

Regards,

Brian.

Mime
View raw message