couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: Unicode normalization (was Re: The 1.0 Thread)
Date Fri, 26 Jun 2009 11:21:14 GMT

On Jun 25, 2009, at 6:53 PM, Noah Slater wrote:

> On Thu, Jun 25, 2009 at 05:37:21PM -0400, Damien Katz wrote:
>> Integrity will be preserved by use of Content-MD5
>
> Bike shed: what about the stronger SHA family of hashes?

Content-MD5 is standard header, I can find no others headers to do  
integrity hashing.
>
>> But it still is specific to the version of CouchDB and it's  
>> dependencies
>> (version of Erlang, version of ICU, etc). It usually be the same  
>> across
>> versions, but is not guaranteed.
>
> If we're doing content hashing, why would this matter?

Because we don't have a formal canonical format, so we aren't even  
trying. We'll be hashing whatever representation we have in-memory,  
and that could change version to version.

>
>> Optionally will allow that if 2 clients make byte identical saves  
>> for a
>> document, they will get the same revision, and you don't need to  
>> return a
>> conflict error the second client to save.
>
> Are there any security issues around possible hash collisions?

No, we aren't checking them later.

>
>> I think this is the most pragmatic way to do deterministic revs and  
>> integrity
>> checking. That is, do as little as possible and let others deal  
>> with the
>> problems and implications of canonicalization if they want to to do  
>> end to end
>> integrity checking.
>
> Seems like a reasonable approach to me.
>
> Best,
>
> -- 
> Noah Slater, http://tumbolia.org/nslater


Mime
View raw message