couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: BigCouch doesn't provide attachment digests?
Date Thu, 05 Apr 2012 19:17:12 GMT
Hi Jens,

It cherry-picked cleanly so it should turn up in next week's code push.

B.

On 5 April 2012 13:49, Robert Newson <rnewson@apache.org> wrote:
> Thanks Jens,
>
> I can backport that.
>
> B.
>
> On 5 April 2012 13:41, Jens Alfke <jens@couchbase.com> wrote:
>> Documents stored in Cloudant databases aren't including MD5 digests of attachment
contents in the _attachments metadata. Here's an example:
>>
>>    "_attachments": {
>>        "photo-15357DCF-9566-4DFD-9120-8A9164EE5873": {
>>            "follows": true,
>>            "length": 79608,
>>            "content_type": "image/jpeg",
>>            "revpos": 2
>>        }
>>    },
>>
>> Other servers don't do this; I assume this is a difference between BigCouch and CouchDB.
Is this intentional? It's causing problems replicating databases from Cloudant to TouchDB,
and the workarounds I can think of for this in TouchDB are either fairly ugly (basically involving
writing a custom JSON parser…) or involve performance regressions.
>>
>> Here's more detail on my problem:
>> * For efficiency, the replicator in TouchDB (like CouchDB 1.2) fetches documents
in MIME multipart format, so that attachments are easily streamable to disk and aren't base64-encoded.
>> * This requires correlating the MIME bodies with the metadata objects in the _attachments
object.
>> * CouchDB (and BigCouch) unfortunately don't add any headers to the MIME bodies to
identify what they are. I've already filed a bug report against this.
>> * TouchDB's replicator works around this by computing an MD5 digest of each MIME
body and then correlating those with the "digest" properties of the attachment metadata objects.
>> * …which fails with Cloudant/BigCouch because that "digest" property is missing.
>>
>> The reason CouchDB itself doesn't have trouble correlating the attachments is that
it knows the MIME bodies are written in the same order as the attachments appear in the _attachments
object. However, key order is not significant in JSON objects, and in most implementations
the parser stores the object contents in a hash table (like a Ruby Hash object or a Cocoa
NSDictionary), which means the ordering of the keys is lost. The only way for me to determine
the true order of the attachment keys would be to write my own specialized JSON parser that
could identify the keys and put the names into an ordered structure like an array.
>>
>> —Jens

Mime
View raw message