couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noah Slater (JIRA)" <>
Subject [jira] Created: (COUCHDB-18) Unicode document names
Date Sat, 08 Mar 2008 23:57:46 GMT
Unicode document names

                 Key: COUCHDB-18
             Project: CouchDB
          Issue Type: Bug
            Reporter: Noah Slater
            Priority: Minor

The documentation at
notes that valid document names (_id) are only [a-zA-Z0-9_] (for now). The
behaviour for non-ASCII names is not specified.

For now what happens is that CouchDB treats document names in a url like
iso-8859-1.. for ex. this is a dump of the HTTP traffic:

PUT /test/%D0%B0%D0%B0%D0%B0%D0%B0 HTTP/1.1
Host: localhost:5984
Accept-Encoding: identity
Content-Length: 40
content-type: application/json
accept: application/json
user-agent: couchdb-python 0.2

{"message": "the medium is the message"}
HTTP/1.1 201 Created
Server: inets/develop
Date: Sat, 05 Jan 2008 17:58:58 GMT
Cache-Control: no-cache
Pragma: no-cache
Expires: Sat, 05 Jan 2008 17:58:58 GMT
Transfer-Encoding: chunked
Content-Type: application/json
Etag: 3412642223



The string %D0%B0%D0%B0%D0%B0%D0%B0 was converted to
"\u00d0\u00b0\u00d0\u00b0\u00d0\u00b0\u00d0\u00b0" ... while the intended
behavior was to get a Cyrillic (utf-8) document name.

Also couchdb-python and the javascript library that ships with couchdb
assume that utf-8 is used as an encoding of the unicode document names.

Everything tested with CouchDB 0.7.2


Forgot to add... the expected output would be "id":"\u0430\u0430\u0430\u0430" . for ex:

$ python
>>> unquote('%D0%B0%D0%B0%D0%B0%D0%B0').decode('utf-8')

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message