couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Randall Leeds (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-687) Add md5 hash to _attachments properties for documents
Date Sat, 21 May 2011 06:22:47 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037277#comment-13037277
] 

Randall Leeds commented on COUCHDB-687:
---------------------------------------

+1 as well for having the digest_type in the metadata.

I want to suggest that we also include the encoding/compression in the metadata about an attachment.
This information has implications for deterministic revision generation since the md5 of attachments
on disk is used in the generation of revisions.

For purity of API and transparency of revision generation I think it's unacceptable for CouchDB
to compute a revision ID based on the digest of an attachment which was compressed server-side.
A client needs to expect that an attachment uploaded to identical documents in two different
couches using the same encoding should result in the same revision. If a client wants to pre-compress
an attachment to upload, and inform couch of the compression using headers, it should be fine
for couch to calculate a digest using the compressed version (and use that in the revision
generation) as long as the compression/encoding format on which the digest is based is exposed
as well.

tl;dr -- CouchDB needs to be transparent in how it's creating revision identifiers. It should
NEVER use a digest generated *after* server-side compression to calculate a revision hash.
It MUST calculate the revision from the data _as provided_ by the client. These are the considerations
I have approaching this patch.

> Add md5 hash to _attachments properties for documents
> -----------------------------------------------------
>
>                 Key: COUCHDB-687
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-687
>             Project: CouchDB
>          Issue Type: Improvement
>         Environment: CouchDB
>            Reporter: mikeal
>            Assignee: Filipe Manana
>         Attachments: couchdb-md5-in-attachment-COUCHDB-687-v2.patch, couchdb-md5-in-attachment-COUCHDB-687-v3.patch,
couchdb-md5-in-attachment-COUCHDB-687.patch, md5.patch
>
>
> The current attachment information looks like this:
> GET /dbname/docid
> "_attachments": {
>       "jquery-1.4.1.min.js": {
>           "content_type": "text/javascript"
>           "revpos": 138
>           "length": 70844
>           "stub": true
>       }
> }
> If a client wanted to sync local files as attachments with a document it would not currently
be able to do so without keeping a local store of the revpos. If this information included
an md5 hash of the attachment clients could compare it against a hash of the local file to
see if they match.
> -Mikeal

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message