couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Besogonov <>
Subject Re: Why MD5 is used for hashes, also about non-deterministic IDs.
Date Wed, 16 Nov 2011 17:46:38 GMT
On Tue, Nov 15, 2011 at 4:23 PM, Randall Leeds <> wrote:
>> Remember that the _rev value is derived from the contents of the
>> documents, all the bytes of all attachments and values from previous
>> revisions. Stock MD5 preimage attacks are of of much simpler form
>> (finding a Y such that MD5(Y)=X for some desired X). Also that you
>> would have to arrange for the same number of updates as well, since
>> the number at the front is incremented on each successful update.
> Also remember that the contents would have to parse as JSON, so that
> restricts this search space even further.
Not really. Binary representation of JSON is used to calculate the hash.

So I can make a document like this:
  "aa" : "xxxxxxxxxxxxxx.....[several thousands x's]"

And use the large 'xxx...x' string as a scratch area for my attack. I don't
even need to bother with quoting issues because CouchDB is going to
unquote everything during JSON parsing. And there are no other hash
codes to work around (working around even two MD5s at the same time
is much harder).

That's about the best possible case for an attacker.

> Then, if I understand Jason
> correctly, we're also talking about a situation where Couch B is
> insecure... it's allowing a malicious user to change documents. If
> these documents are anything more important than something affecting
> the user herself then what you have is a malicious administrator or an
> insecure deployment. I don't think MD5 is to blame here.
No, the issue here is a possibility to break the synchronization.

> Does that sound like a reasonable assessment to you, Alex?

> Also, I'd love to hear about your C++ replicator as it develops.
Sure, I'm developing a very small and fast embedded storage for mobile
devices and desktop apps. It'll be open source once I finish its core.

> -Randall
>> For switching from MD5 to SHA-1, I say no. If we switch, let's use
>> something contemporary like SHA-256. Better yet, let's wait for the
>> winner of the SHA-3 competition.
>> B.
>> On 15 November 2011 07:57, Jason Smith <> wrote:
>>> On Tue, Nov 15, 2011 at 7:34 AM, Alex Besogonov
>>> <> wrote:
>>>>>> Now I make a change to 'Doc' at machine A. This creates a new revid
>>>>>> with new md5 hash.
>>>>>> A malicious software somehow learns about this update and creates
>>>>>> another document
>>>>>> on machine B, contriving it so to make the resulting hash to be the
>>>>>> same as on machine A.
>>>>> Before going any further, you must show why we care about the contents
>>>>> of machine B.
>>>>> Why would I log in to machine B if I do not trust B's owner? Why would
>>>>> I clone your Git repository if I do not know you?
>>>> The problem is, MD5 hash depends on _untrusted_ data that external
>>>> processes might put into the database.
>>>> For example, imagine that machines A and B use CouchDB to store
>>>> certificates.
>>> I ask again.
>>> --
>>> Iris Couch

View raw message