Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8F4599A28 for ; Thu, 5 Apr 2012 19:17:15 +0000 (UTC) Received: (qmail 31244 invoked by uid 500); 5 Apr 2012 19:17:14 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 31212 invoked by uid 500); 5 Apr 2012 19:17:13 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 31196 invoked by uid 99); 5 Apr 2012 19:17:13 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Apr 2012 19:17:13 +0000 Received: from localhost (HELO mail-iy0-f180.google.com) (127.0.0.1) (smtp-auth username rnewson, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Apr 2012 19:17:13 +0000 Received: by iage36 with SMTP id e36so2964909iag.11 for ; Thu, 05 Apr 2012 12:17:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.219.163 with SMTP id pp3mr3266398igc.1.1333653432632; Thu, 05 Apr 2012 12:17:12 -0700 (PDT) Received: by 10.42.240.135 with HTTP; Thu, 5 Apr 2012 12:17:12 -0700 (PDT) In-Reply-To: References: Date: Thu, 5 Apr 2012 15:17:12 -0400 Message-ID: Subject: Re: BigCouch doesn't provide attachment digests? From: Robert Newson To: user@couchdb.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Jens, It cherry-picked cleanly so it should turn up in next week's code push. B. On 5 April 2012 13:49, Robert Newson wrote: > Thanks Jens, > > I can backport that. > > B. > > On 5 April 2012 13:41, Jens Alfke wrote: >> Documents stored in Cloudant databases aren't including MD5 digests of a= ttachment contents in the _attachments metadata. Here's an example: >> >> =A0 =A0"_attachments": { >> =A0 =A0 =A0 =A0"photo-15357DCF-9566-4DFD-9120-8A9164EE5873": { >> =A0 =A0 =A0 =A0 =A0 =A0"follows": true, >> =A0 =A0 =A0 =A0 =A0 =A0"length": 79608, >> =A0 =A0 =A0 =A0 =A0 =A0"content_type": "image/jpeg", >> =A0 =A0 =A0 =A0 =A0 =A0"revpos": 2 >> =A0 =A0 =A0 =A0} >> =A0 =A0}, >> >> Other servers don't do this; I assume this is a difference between BigCo= uch and CouchDB. Is this intentional? It's causing problems replicating dat= abases from Cloudant to TouchDB, and the workarounds I can think of for thi= s in TouchDB are either fairly ugly (basically involving writing a custom J= SON parser=85) or involve performance regressions. >> >> Here's more detail on my problem: >> * For efficiency, the replicator in TouchDB (like CouchDB 1.2) fetches d= ocuments in MIME multipart format, so that attachments are easily streamabl= e to disk and aren't base64-encoded. >> * This requires correlating the MIME bodies with the metadata objects in= the _attachments object. >> * CouchDB (and BigCouch) unfortunately don't add any headers to the MIME= bodies to identify what they are. I've already filed a bug report against = this. >> * TouchDB's replicator works around this by computing an MD5 digest of e= ach MIME body and then correlating those with the "digest" properties of th= e attachment metadata objects. >> * =85which fails with Cloudant/BigCouch because that "digest" property i= s missing. >> >> The reason CouchDB itself doesn't have trouble correlating the attachmen= ts is that it knows the MIME bodies are written in the same order as the at= tachments appear in the _attachments object. However, key order is not sign= ificant in JSON objects, and in most implementations the parser stores the = object contents in a hash table (like a Ruby Hash object or a Cocoa NSDicti= onary), which means the ordering of the keys is lost. The only way for me t= o determine the true order of the attachment keys would be to write my own = specialized JSON parser that could identify the keys and put the names into= an ordered structure like an array. >> >> =97Jens