couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <>
Subject Re: Unicode normalization (was Re: The 1.0 Thread)
Date Tue, 30 Jun 2009 15:46:34 GMT

On Jun 30, 2009, at 11:22 AM, Noah Slater wrote:

> On Tue, Jun 30, 2009 at 07:12:07AM -0400, Damien Katz wrote:
>> Im not sure I understand why we can't just calculate and send the MD5
>> header for the content range.
> We could, but are you not proposing that we use this value for the  
> document
> revision? If that is the case, when you do range requests, the hash  
> sent back
> doesn't actually correspond to anything. If I used the hash from the  
> final range
> request of a document to post an update, it would presumably fail.

To clarify, the point of deterministic rev ids is only to avoid  
unnecessary conflicts when the identical edits are made on 2 different  
replicas. If the content was identical when editing the same revision,  
it should not be a conflict. If we had a canonical representation of  
the document, we could also use the determanistic rev ids for  
integrity checking, but we don't have a canonical representation, and  
creating one is very difficult to get right.

What I'm proposing is that we only use content-MD5 for payload  
integrity checking. It will not being used for security and it cannot  
be validated against the rev id because they will always be different.  
The rev Id will be generated based on the erlang term format of the  
document, not the UTF8 JSON string that gets sent to the client.

So the server will send it's responses (perhaps optionally) with a MD5  
hash to detect packet corruption. Clients, when they send docs and  
attachments, can send the payload with a content-MD5 header and the  
server will check it to make sure it's uncorrupted. As it writes the  
data to disk the server will compute the MD5 hash, for it's own  
integrity checking later.

So for example, the replicator will check the md5 sig from the server  
and send it's own md5 sig when writing data. This prevents network  
problems from introducing corruptions to data as it replicates.


> Best,
> -- 
> Noah Slater,

View raw message