couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: Why MD5 is used for hashes, also about non-deterministic IDs.
Date Tue, 15 Nov 2011 09:43:46 GMT
_rev values used to be UUID's and became deterministic to improve
replication performance. I can see that there's a theoretical issue
where replication could be inhibited, though I question how practical
it is given the internal details of _rev calculation.

Remember that the _rev value is derived from the contents of the
documents, all the bytes of all attachments and values from previous
revisions. Stock MD5 preimage attacks are of of much simpler form
(finding a Y such that MD5(Y)=X for some desired X). Also that you
would have to arrange for the same number of updates as well, since
the number at the front is incremented on each successful update.

For switching from MD5 to SHA-1, I say no. If we switch, let's use
something contemporary like SHA-256. Better yet, let's wait for the
winner of the SHA-3 competition.

B.

On 15 November 2011 07:57, Jason Smith <jhs@iriscouch.com> wrote:
> On Tue, Nov 15, 2011 at 7:34 AM, Alex Besogonov
> <alex.besogonov@gmail.com> wrote:
>>>> Now I make a change to 'Doc' at machine A. This creates a new revid
>>>> with new md5 hash.
>>>> A malicious software somehow learns about this update and creates
>>>> another document
>>>> on machine B, contriving it so to make the resulting hash to be the
>>>> same as on machine A.
>>> Before going any further, you must show why we care about the contents
>>> of machine B.
>>> Why would I log in to machine B if I do not trust B's owner? Why would
>>> I clone your Git repository if I do not know you?
>> The problem is, MD5 hash depends on _untrusted_ data that external
>> processes might put into the database.
>>
>> For example, imagine that machines A and B use CouchDB to store
>> certificates.
>
> I ask again.
>
> --
> Iris Couch
>

Mime
View raw message