couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Besogonov <>
Subject Re: Why MD5 is used for hashes, also about non-deterministic IDs.
Date Tue, 15 Nov 2011 07:34:46 GMT
>> Now I make a change to 'Doc' at machine A. This creates a new revid
>> with new md5 hash.
>> A malicious software somehow learns about this update and creates
>> another document
>> on machine B, contriving it so to make the resulting hash to be the
>> same as on machine A.
> Before going any further, you must show why we care about the contents
> of machine B.
> Why would I log in to machine B if I do not trust B's owner? Why would
> I clone your Git repository if I do not know you?
The problem is, MD5 hash depends on _untrusted_ data that external
processes might put into the database.

For example, imagine that machines A and B use CouchDB to store
certificates. On machine A administrator issues a certificate revocation
record for a certificate stored in 'Doc2'. On machine B a malware issues
a no-op update of 'Doc2' which is contrived to have the same ID as the
certificate revocation record issued on machine A (using normal document
management functionality).

Such tampering would be normally noticed by presence of conflicts in
replication, but in this case it would go unnoticed!

This is a somewhat contrived example, but for me the most crucial fact is
that external potentially untrusted data can force CouchDB to behave
incorrectly and violate its invariants.

It can be ignored as a minor issue, of course, but in this case the fix is
simple - just a switch from MD5 to something more secure like SHA-1.

> Finally, revision tokens might look like MD5, but they are not. They
> especially look like MD5 if you read the source code. But they are not
> MD5. They are opaque tokens. They do not serve a security function.
> Between trusted nodes, they indicate document changes.
I'm actually writing a connector between CouchDB and external system, so
I'm reimplementing all the functionality required for the synchronization
protocol from scratch (in C++). Quite an interesting task.

View raw message