couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maximillian Dornseif (JIRA)" <j...@apache.org>
Subject [jira] Created: (COUCHDB-254) Non-Unicde characters in an attachment name render a document unreadable.
Date Fri, 13 Feb 2009 21:44:59 GMT
Non-Unicde characters in an attachment name render a document unreadable.
-------------------------------------------------------------------------

                 Key: COUCHDB-254
                 URL: https://issues.apache.org/jira/browse/COUCHDB-254
             Project: CouchDB
          Issue Type: Bug
          Components: Database Core
    Affects Versions: 0.9
         Environment: Linux, erlang, 12b-5, couchdb r791265
            Reporter: Maximillian Dornseif
            Priority: Critical


Attatchment names containing nun unicode characters can be created easily because URI-s are
(nearly) 8-bit clean. But when reading they are encoded into utf-8 which doesn't work out.
So you are left with unreadable database entries.

I was not able to generate invalid UTF-8 in JavaScript but a test case would look somewhat
like this:

--- couch_tests.js      2009-02-05 19:47:20.000000000 +0000
+++ /usr/local/share/couchdb/www/script/couch_tests.js  2009-02-13 21:34:23.000000000 +0000
@@ -1078,9 +1078,31 @@
     var xhr = CouchDB.request("GET", "/test_suite_db/bin_doc4/attachment.txt");
     T(xhr.status == 200);
     T(xhr.responseText == "This is a string");
-
   },
 
+  attatchment_names : function(debug) {
+    var db = new CouchDB("test_suite_db");
+    db.deleteDb();
+    db.createDb();
+    if (debug) debugger;
+
+    var binAttDoc = {
+      _id: "bin_doc",
+      _attachments:{
+        "foo\x80txt": {
+          content_type:"text/plain",
+          data: "VGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIHRleHQ="
+        }
+      }
+    }
+
+    var save_response = db.save(binAttDoc);
+    T(save_response.ok);
+
+    var xhr = CouchDB.request("GET", "/test_suite_db/bin_doc\x80foo.txt");
+    T(xhr.responseText == "This is a base64 encoded text");
+},
+
   attachment_paths : function(debug) {
     if (debug) debugger;
     var dbNames = ["test_suite_db", "test_suite_db/with_slashes"];



A python script (fuzzer?) for triggering the bug looks like this:

import sys
import couchdb.client

COUCHSERVER = "http://localhost:5984"
COUCHDB_NAME = "md_test"

def _setup_couchdb():
    """Get a connection handler to the CouchDB Database, creating it when needed."""
    server = couchdb.client.Server(COUCHSERVER)
    print "using %s/%s" % (COUCHSERVER, COUCHDB_NAME)
    if COUCHDB_NAME in server:
        return server[COUCHDB_NAME]
    else:
        return server.create(COUCHDB_NAME)
    
def main():
    db = _setup_couchdb()
    doc_id = "doc_id"
    
    try:
        doc = db[doc_id]
    except couchdb.client.ResourceNotFound:
        doc = {}
    
    db[doc_id] = doc
    for i in range(256):
        char = chr(i)
        name = "___%s___" % (char)
        print "checking %r (%d) " % (char, i),
        sys.stdout.flush()
        db.put_attachment(db[doc_id], "data", name)
        db[doc_id]
        print '\r',
    print 

main()



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message