couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jens Alfke (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-254) Non-Unicde characters in an attachment name render a document unreadable.
Date Sun, 15 Feb 2009 19:38:59 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12673705#action_12673705
] 

Jens Alfke commented on COUCHDB-254:
------------------------------------

A more accurate title would be "Invalid UTF-8 encoding in an attachment URI renders a document
unreadable." The issue, as I understand it, isn't with Unicode characters but with the UTF-8
byte sequences that encode them. There are many byte sequences that aren't syntactically-valid
UTF-8 and can't be decoded into a Unicode string.

Sounds like the issue here is that the server code handling the PUT request is extracting
the attachment name from the URI as a byte string, without validating that it's valid UTF-8.
The invalid name then goes into the database, but when it's written out into a JSON response,
it poisons the response text and causes it to fail UTF-8 decoding.

So, basically any place the HTTP server is receiving a request, it should validate the URI's
UTF-8 encoding, and send back a 400 error if it fails. This is really easy to do, but unfortunately
I know very little Erlang, and nothing about how it works with Unicode...

> Non-Unicde characters in an attachment name render a document unreadable.
> -------------------------------------------------------------------------
>
>                 Key: COUCHDB-254
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-254
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: Linux, erlang, 12b-5, couchdb r791265
>            Reporter: Maximillian Dornseif
>            Priority: Critical
>
> Attatchment names containing nun unicode characters can be created easily because URI-s
are (nearly) 8-bit clean. But when reading they are encoded into utf-8 which doesn't work
out. So you are left with unreadable database entries.
> I was not able to generate invalid UTF-8 in JavaScript but a test case would look somewhat
like this:
> --- couch_tests.js      2009-02-05 19:47:20.000000000 +0000
> +++ /usr/local/share/couchdb/www/script/couch_tests.js  2009-02-13 21:34:23.000000000
+0000
> @@ -1078,9 +1078,31 @@
>      var xhr = CouchDB.request("GET", "/test_suite_db/bin_doc4/attachment.txt");
>      T(xhr.status == 200);
>      T(xhr.responseText == "This is a string");
> -
>    },
>  
> +  attatchment_names : function(debug) {
> +    var db = new CouchDB("test_suite_db");
> +    db.deleteDb();
> +    db.createDb();
> +    if (debug) debugger;
> +
> +    var binAttDoc = {
> +      _id: "bin_doc",
> +      _attachments:{
> +        "foo\x80txt": {
> +          content_type:"text/plain",
> +          data: "VGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIHRleHQ="
> +        }
> +      }
> +    }
> +
> +    var save_response = db.save(binAttDoc);
> +    T(save_response.ok);
> +
> +    var xhr = CouchDB.request("GET", "/test_suite_db/bin_doc\x80foo.txt");
> +    T(xhr.responseText == "This is a base64 encoded text");
> +},
> +
>    attachment_paths : function(debug) {
>      if (debug) debugger;
>      var dbNames = ["test_suite_db", "test_suite_db/with_slashes"];
> A python script (fuzzer?) for triggering the bug looks like this:
> import sys
> import couchdb.client
> COUCHSERVER = "http://localhost:5984"
> COUCHDB_NAME = "md_test"
> def _setup_couchdb():
>     """Get a connection handler to the CouchDB Database, creating it when needed."""
>     server = couchdb.client.Server(COUCHSERVER)
>     print "using %s/%s" % (COUCHSERVER, COUCHDB_NAME)
>     if COUCHDB_NAME in server:
>         return server[COUCHDB_NAME]
>     else:
>         return server.create(COUCHDB_NAME)
>     
> def main():
>     db = _setup_couchdb()
>     doc_id = "doc_id"
>     
>     try:
>         doc = db[doc_id]
>     except couchdb.client.ResourceNotFound:
>         doc = {}
>     
>     db[doc_id] = doc
>     for i in range(256):
>         char = chr(i)
>         name = "___%s___" % (char)
>         print "checking %r (%d) " % (char, i),
>         sys.stdout.flush()
>         db.put_attachment(db[doc_id], "data", name)
>         db[doc_id]
>         print '\r',
>     print 
> main()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message