incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Alfke <j...@couchbase.com>
Subject BigCouch doesn't provide attachment digests?
Date Thu, 05 Apr 2012 17:41:42 GMT
Documents stored in Cloudant databases aren't including MD5 digests of attachment contents
in the _attachments metadata. Here's an example:

    "_attachments": {
        "photo-15357DCF-9566-4DFD-9120-8A9164EE5873": {
            "follows": true,
            "length": 79608,
            "content_type": "image/jpeg",
            "revpos": 2
        }
    },

Other servers don't do this; I assume this is a difference between BigCouch and CouchDB. Is
this intentional? It's causing problems replicating databases from Cloudant to TouchDB, and
the workarounds I can think of for this in TouchDB are either fairly ugly (basically involving
writing a custom JSON parser…) or involve performance regressions.

Here's more detail on my problem:
* For efficiency, the replicator in TouchDB (like CouchDB 1.2) fetches documents in MIME multipart
format, so that attachments are easily streamable to disk and aren't base64-encoded.
* This requires correlating the MIME bodies with the metadata objects in the _attachments
object.
* CouchDB (and BigCouch) unfortunately don't add any headers to the MIME bodies to identify
what they are. I've already filed a bug report against this.
* TouchDB's replicator works around this by computing an MD5 digest of each MIME body and
then correlating those with the "digest" properties of the attachment metadata objects.
* …which fails with Cloudant/BigCouch because that "digest" property is missing.

The reason CouchDB itself doesn't have trouble correlating the attachments is that it knows
the MIME bodies are written in the same order as the attachments appear in the _attachments
object. However, key order is not significant in JSON objects, and in most implementations
the parser stores the object contents in a hash table (like a Ruby Hash object or a Cocoa
NSDictionary), which means the ordering of the keys is lost. The only way for me to determine
the true order of the attachment keys would be to write my own specialized JSON parser that
could identify the keys and put the names into an ordered structure like an array.

—Jens
Mime
View raw message