couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Calculating Revision IDs outside erlang (proposal to add {minor_version, 1} to the calc)
Date Wed, 23 Mar 2016 02:45:14 GMT
+1 to adding the minor version option. Floats are hard. Its still not
perfect but it at least should make most cases easier.

On Tue, Mar 22, 2016 at 7:30 PM, Michael Fair <> wrote:
> Greetings CouchDBers!
> I've been modifying a BERT library to recreate the md5 calc of a RevisionID
> in Java.
> I haven't tackled attachments yet, however with the awesome help of rnewson
> on the IRC channel, I've succeeded in recreating the md5 for all the
> documents I've tried so far which includes docs with values of strings, big
> and small integers, lists of big integers, lists of small integers, true,
> false, null, and objects; however the glaring exception is floats.
> The {minor_version, 0} format used for floats (A 31 byte string based
> representation in %.20e format) is dependent on the host environment doing
> the encoding and can't be reliably duplicated in other machines and
> languages.
> For instance, here are examples of encoding 3.14159 as %.20e string on this
> laptop:
> erlang: 3.1415899999999999000e+00  (This is what term_to_binary is using)
> python: 3.14158999999999988262e+00
> java:   3.14159000000000000000e+00
> These minor numerical differences unfortunately make the md5 computation
> untenable.  And further, it seems that even different OTP versions and
> different hardware will encode the {minor_version, 0} format slightly
> differently on different Couch instances (A couple people on IRC shared
> with me what their OTP produced).
> To make a long story short and spare folks reading the mind-numbing
> details, without changing something, replicating the md5 for the revision
> id of documents with floats just can't be done sanely.
> As things are now, like I mentioned, even different installations of
> CouchDB can disagree on the MD5 revision id for the document {"pi":3.14159}.
> So where does this create an issue?
> It shows up by creating a conflict document during replication when the two
> servers calculated different revision ids for the same document update
> (which only happens if it was a multi-master update (an update where both
> sides were updated before replicating -- like separate laptops on separate
> planes each doing the same thing)).
> If only one side or the other was updated, it doesn't cause a problem.
> My goal is enabling people to upload documents from multiple server
> applications using JSON and Couch to handle the replication bits.
> To give this heterogeneous environment the same multi-master intelligence
> that Couch has, they need to be able to compute the same revision id that
> Couch would compute; otherwise documents modified directly in couch could
> create these kinds of multi-master type conflicts.
> ----
> What to do (aside from simply do nothing)?
> At the least I recommend changing the term_to_binary computation to use the
> {minor_version, 1} option in the rev_id calculation.
> This changes how floats are encoded to the 64-bit IEEE format.  It became
> the standard way of encoding floats in OTP 17.0+ and is available as an
> option all the way back to OTP 11.  As long as it's explicitly provided as
> a requested option in the term_to_binary call, all currently deployed OTP
> installations for Couch can do it.
> Doing this normalizes the md5 calculation for floats regardless of the OTP
> platform, and should make it feasible for third party applications to
> replicate the encoding.
> I have some other ideas beyond that, but they would require changes to the
> replication protocol to support.
> ----
> For anyone interested I'd be happy to share the code I have.  It's still a
> bit rough in the document construction part, but once constructed, getting
> the binary encoding and revision id are each just a single call.
> Thanks,
> Mike

View raw message