couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Why MD5 is used for hashes, also about non-deterministic IDs.
Date Thu, 17 Nov 2011 23:22:47 GMT
Thanks Randall :)

On Nov 17, 2011, at 23:57 , Randall Leeds wrote:

> On Wed, Nov 16, 2011 at 09:46, Alex Besogonov <alex.besogonov@gmail.com> wrote:
>> On Tue, Nov 15, 2011 at 4:23 PM, Randall Leeds <randall.leeds@gmail.com> wrote:
>>>> Remember that the _rev value is derived from the contents of the
>>>> documents, all the bytes of all attachments and values from previous
>>>> revisions. Stock MD5 preimage attacks are of of much simpler form
>>>> (finding a Y such that MD5(Y)=X for some desired X). Also that you
>>>> would have to arrange for the same number of updates as well, since
>>>> the number at the front is incremented on each successful update.
>>> Also remember that the contents would have to parse as JSON, so that
>>> restricts this search space even further.
>> Not really. Binary representation of JSON is used to calculate the hash.
>> 
>> So I can make a document like this:
>> ===
>> {
>>  "aa" : "xxxxxxxxxxxxxx.....[several thousands x's]"
>> }
>> ===
>> 
>> And use the large 'xxx...x' string as a scratch area for my attack. I don't
>> even need to bother with quoting issues because CouchDB is going to
>> unquote everything during JSON parsing. And there are no other hash
>> codes to work around (working around even two MD5s at the same time
>> is much harder).
>> 
>> That's about the best possible case for an attacker.
> 
> This "attack", though, is still pretty hard, and, I think, not an
> attack. The document _does_ have to take a trip through a JSON parser,
> pass as valid JSON, but create an MD5 sum, along with the metadata,
> that matches the revision id of the original document. All this needs
> to be done on a Couch that is trusted to perform unfiltered,
> bi-directional replication and allows the attacker to change documents
> that matter to other people.
> 
> The proper way to stop the "attack" is to not let users modify
> documents that will screw up things for other people. It's kind of
> like how a UNIX user is _welcome_ to trash their .bashrc and just
> because their home directory is mounted over NFS and now their .bashrc
> is trashed _everywhere_ doesn't mean they've really done any damage
> from anyone else's point of view. They didn't attack anything but
> themselves.
> 
> ----
> 
> However. It's worth noting that an attacker can just make up whatever
> revision identifiers they want to, without dealing with the MD5 stuff
> anyway!!! Passing ?new_edits=false allows an "attacker" to specify
> that a document has any revision they want, with whatever history of
> revisions they want.
> 
> curl -XPUT -H"Content-Type: application/json"
> http://some.couch/somedb/document?new_edits=false
> -d'{"_id":"document", "_rev":"5-anything",
> "_revisions":{"start":5,ids:["anything",
> "everything","bogus","revids"]}}'
> 
> (Side note to devs: we may want to deterministically prune the leaves
> for duplicates after merging rev trees, or not, because, well, this is
> a crazy hand-crafted fake-out and caveat power-user.)
> 
> In fact, I just discovered yesterday that you can create unreachable
> conflicts this way, by giving them revision ids and histories that
> create two branches with identical leaves but different stems. If
> CouchDB did decide to enforce some crypto-verifiable contraints on
> revision ids, they could be checked to prevent this kind of
> mis-history. However, other implementations would be forced to follow
> the same scheme. I think the intention of making the revision ID
> opaque was to make it an implementation detail and specifically _not_
> a security or validation feature.
> 
> That said, I'm starting to come around to this idea. I'd be happy to
> see patches that enable a "strict revisions mode" for CouchDB. I don't
> feel like CouchDB has made any promises that are broken by using MD5,
> but additional promises could possibly be made if we took a git-like
> approach to revision crypto.
> 
> I hope that settles the "why", reassures any
> "oh-my-god-my-couch-is-vulnerable", and motivates the
> "hey-lets-make-a-patch" if you still want the feature, with the
> understanding that it's unlikely the project will specify this as a
> necessary condition for general-purpose replication. If you have more
> bullet-proof needs, dev that armor up and I'll review it, but I'd
> advise making it a config option.
> 
> -Randall
> 
>> 
>>> Then, if I understand Jason
>>> correctly, we're also talking about a situation where Couch B is
>>> insecure... it's allowing a malicious user to change documents. If
>>> these documents are anything more important than something affecting
>>> the user herself then what you have is a malicious administrator or an
>>> insecure deployment. I don't think MD5 is to blame here.
>> No, the issue here is a possibility to break the synchronization.
>> 
>>> Does that sound like a reasonable assessment to you, Alex?
>> Almost.
>> 
>>> Also, I'd love to hear about your C++ replicator as it develops.
>> Sure, I'm developing a very small and fast embedded storage for mobile
>> devices and desktop apps. It'll be open source once I finish its core.
>> 
>>> -Randall
>>> 
>>>> For switching from MD5 to SHA-1, I say no. If we switch, let's use
>>>> something contemporary like SHA-256. Better yet, let's wait for the
>>>> winner of the SHA-3 competition.
>>>> 
>>>> B.
>>>> 
>>>> On 15 November 2011 07:57, Jason Smith <jhs@iriscouch.com> wrote:
>>>>> On Tue, Nov 15, 2011 at 7:34 AM, Alex Besogonov
>>>>> <alex.besogonov@gmail.com> wrote:
>>>>>>>> Now I make a change to 'Doc' at machine A. This creates a
new revid
>>>>>>>> with new md5 hash.
>>>>>>>> A malicious software somehow learns about this update and
creates
>>>>>>>> another document
>>>>>>>> on machine B, contriving it so to make the resulting hash
to be the
>>>>>>>> same as on machine A.
>>>>>>> Before going any further, you must show why we care about the
contents
>>>>>>> of machine B.
>>>>>>> Why would I log in to machine B if I do not trust B's owner?
Why would
>>>>>>> I clone your Git repository if I do not know you?
>>>>>> The problem is, MD5 hash depends on _untrusted_ data that external
>>>>>> processes might put into the database.
>>>>>> 
>>>>>> For example, imagine that machines A and B use CouchDB to store
>>>>>> certificates.
>>>>> 
>>>>> I ask again.
>>>>> 
>>>>> --
>>>>> Iris Couch
>>>>> 
>>>> 
>>> 
>> 


Mime
View raw message