couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <>
Subject Re: Why MD5 is used for hashes, also about non-deterministic IDs.
Date Tue, 15 Nov 2011 21:23:33 GMT
On Tue, Nov 15, 2011 at 01:43, Robert Newson <> wrote:
> _rev values used to be UUID's and became deterministic to improve
> replication performance. I can see that there's a theoretical issue
> where replication could be inhibited, though I question how practical
> it is given the internal details of _rev calculation.
> Remember that the _rev value is derived from the contents of the
> documents, all the bytes of all attachments and values from previous
> revisions. Stock MD5 preimage attacks are of of much simpler form
> (finding a Y such that MD5(Y)=X for some desired X). Also that you
> would have to arrange for the same number of updates as well, since
> the number at the front is incremented on each successful update.

Also remember that the contents would have to parse as JSON, so that
restricts this search space even further. Then, if I understand Jason
correctly, we're also talking about a situation where Couch B is
insecure... it's allowing a malicious user to change documents. If
these documents are anything more important than something affecting
the user herself then what you have is a malicious administrator or an
insecure deployment. I don't think MD5 is to blame here.

Does that sound like a reasonable assessment to you, Alex?

Also, I'd love to hear about your C++ replicator as it develops.


> For switching from MD5 to SHA-1, I say no. If we switch, let's use
> something contemporary like SHA-256. Better yet, let's wait for the
> winner of the SHA-3 competition.
> B.
> On 15 November 2011 07:57, Jason Smith <> wrote:
>> On Tue, Nov 15, 2011 at 7:34 AM, Alex Besogonov
>> <> wrote:
>>>>> Now I make a change to 'Doc' at machine A. This creates a new revid
>>>>> with new md5 hash.
>>>>> A malicious software somehow learns about this update and creates
>>>>> another document
>>>>> on machine B, contriving it so to make the resulting hash to be the
>>>>> same as on machine A.
>>>> Before going any further, you must show why we care about the contents
>>>> of machine B.
>>>> Why would I log in to machine B if I do not trust B's owner? Why would
>>>> I clone your Git repository if I do not know you?
>>> The problem is, MD5 hash depends on _untrusted_ data that external
>>> processes might put into the database.
>>> For example, imagine that machines A and B use CouchDB to store
>>> certificates.
>> I ask again.
>> --
>> Iris Couch

View raw message