incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Use MD5 (or alternative digest hash) as _id
Date Wed, 28 Sep 2011 05:47:12 GMT

There are a couple issues with implementing such a thing inside CouchDB itself.

The biggest road block is how to deal with updates. Revisions and
conflicts are a core part of the CouchDB model. A CAS implementation
would internally be quite complex to implement because we'd basically
have to switch all of the internal logic based on whether we're doing
CAS or normal CouchDB revision handling.

The second is that generating hashes of JSON is much harder than it
appears at first glance. There have been efforts to try and create a
canonical JSON that could be used for generating hashes consistently
[1] but if you read the notes and caveats its quite obviously not
general enough for arbitrary usage (ie, no support for floats).

As I see it, the behavior and use is quite possible but is best left
to clients to implement. I would also point out that worry about
reducing the cost of MD5 calculations probably shouldn't be a primary
concern. There's much much more in the stack that will end up being a
bottleneck before your hashing algorithm.

On Wed, Sep 28, 2011 at 12:08 AM, Dave Sann <> wrote:
> Hi all,
> I am just starting out with Couch DB and I was wondering whether it is
> possible or planned to have the database use a specific digest/hash
> algorithm, rather than a GUID when auto-generating identifiers.
> In my case, I don actually care what the id is, but I do want to avoid
> duplicate documents.
> Effectively using couch as an indexed content addressable storage.
> Since couch calculates a hash/digest to manage revisions it would seem
> fairly sensible and efficient if this was used for the ID.
> If I generate an id as an md5 - I am wary that the couch calculated value
> will be different due to minor differences in the data before/after
> transmission.
> That would also duplicate processing.
> Thanks for any input
> Regards
> Dave

View raw message