incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Alfke <>
Subject BigCouch returns compressed attachments without indicating they're compressed
Date Thu, 17 May 2012 22:47:13 GMT
I’m having (or rather, TouchDB is having*) problems receiving documents with attachments
from Cloudant; I assume this is a difference between BigCouch and CouchDB. I believe it's
a bug in the server.

The issue is that the server is returning compressed attachment bodies without indicating
that they’re compressed. TouchDB barfs because the length of the received data doesn’t
match the “length” property in the _attachments entry, and there is no "encoded_length"
property giving the encoded length, let alone an "encoding" property that indicates that the
data's been compressed (and by what algorithm.)

For example, take this document <> which
has a 5313-byte HTML attachment.

A plain GET returns:

> {"_id":"readme","_rev":"2-4eb511f5ad0707c6e9fb1160b3f0bedd","_attachments":{"README.html":{"content_type":"text\/html","revpos":2,"digest":"md5-DRLenhWRAAAW9Q0RHyrG+w==","length":5313,"stub":true}}}

If I ask for the attachment inline I get:

> {"_id":"readme","_rev":"2-4eb511f5ad0707c6e9fb1160b3f0bedd","_attachments":{"README.html":{"content_type":"text\/html","revpos":2,"digest":"md5-DRLenhWRAAAW9Q0RHyrG+w==","data":"PGgxIGlkPSJ0b3…{{{lots
of Base64 data}}}..."}}}

where the base64 data decodes to 2136 bytes, and is not HTML but GZIPped HTML.

Asking for the document with attachments in MIME multipart format results in:

> --fbd433e586402848d98875903ea97f67
> content-type: application/json
> {"_id":"readme","_rev":"2-4eb511f5ad0707c6e9fb1160b3f0bedd","_attachments":{"README.html":{"content_type":"text\/html","revpos":2,"digest":"md5-DRLenhWRAAAW9Q0RHyrG+w==","length":5313,"follows":true}}}
> --fbd433e586402848d98875903ea97f67
> {{{2136 bytes of GZIP data}}}
> --fbd433e586402848d98875903ea97f67—

Same thing — the data is GZIPped but there is no metadata to indicate the fact.

I believe this is a bug in BigCouch. It results in an ambiguity as to whether the content
is encoded or not (and if so, what encoding is being used.) In the worst case you could have
an attachment whose GZIPped encoding is exactly the same length as the raw data, in which
case there would be no way to tell whether it was encoded or not since the lengths would match
either way.


View raw message