couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randall Leeds <randall.le...@gmail.com>
Subject Re: Why MD5 is used for hashes, also about non-deterministic IDs.
Date Tue, 15 Nov 2011 21:23:33 GMT
On Tue, Nov 15, 2011 at 01:43, Robert Newson <rnewson@apache.org> wrote:
> _rev values used to be UUID's and became deterministic to improve
> replication performance. I can see that there's a theoretical issue
> where replication could be inhibited, though I question how practical
> it is given the internal details of _rev calculation.
>
> Remember that the _rev value is derived from the contents of the
> documents, all the bytes of all attachments and values from previous
> revisions. Stock MD5 preimage attacks are of of much simpler form
> (finding a Y such that MD5(Y)=X for some desired X). Also that you
> would have to arrange for the same number of updates as well, since
> the number at the front is incremented on each successful update.
>

Also remember that the contents would have to parse as JSON, so that
restricts this search space even further. Then, if I understand Jason
correctly, we're also talking about a situation where Couch B is
insecure... it's allowing a malicious user to change documents. If
these documents are anything more important than something affecting
the user herself then what you have is a malicious administrator or an
insecure deployment. I don't think MD5 is to blame here.

Does that sound like a reasonable assessment to you, Alex?

Also, I'd love to hear about your C++ replicator as it develops.

-Randall

> For switching from MD5 to SHA-1, I say no. If we switch, let's use
> something contemporary like SHA-256. Better yet, let's wait for the
> winner of the SHA-3 competition.
>
> B.
>
> On 15 November 2011 07:57, Jason Smith <jhs@iriscouch.com> wrote:
>> On Tue, Nov 15, 2011 at 7:34 AM, Alex Besogonov
>> <alex.besogonov@gmail.com> wrote:
>>>>> Now I make a change to 'Doc' at machine A. This creates a new revid
>>>>> with new md5 hash.
>>>>> A malicious software somehow learns about this update and creates
>>>>> another document
>>>>> on machine B, contriving it so to make the resulting hash to be the
>>>>> same as on machine A.
>>>> Before going any further, you must show why we care about the contents
>>>> of machine B.
>>>> Why would I log in to machine B if I do not trust B's owner? Why would
>>>> I clone your Git repository if I do not know you?
>>> The problem is, MD5 hash depends on _untrusted_ data that external
>>> processes might put into the database.
>>>
>>> For example, imagine that machines A and B use CouchDB to store
>>> certificates.
>>
>> I ask again.
>>
>> --
>> Iris Couch
>>
>

Mime
View raw message