couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Millwood <...@millwoodonline.co.uk>
Subject Re: Calculating Revision IDs outside erlang (proposal to add {minor_version, 1} to the calc)
Date Wed, 23 Mar 2016 16:51:15 GMT
Don't think this is very relevant, but thought some people might be
interested.
This is how we generate a revision ID in Drupal to use with CouchDB.
https://github.com/dickolsson/drupal-multiversion/blob/8.x-1.x/src/MultiversionManager.php#L436-L457

On 23 March 2016 at 16:41, Jan Lehnardt <jan@apache.org> wrote:

> Great sleuthing Michael!
>
> In addition to the recommendation to upgrade to {minor_version: 1}, which
> could
> be a good first step, how about going the extra mile to make _rev
> generation
> easier across platforms? This would benefit PouchDB and others.
>
> Best
> Jan
> --
>
> > On 23 Mar 2016, at 01:30, Michael Fair <michael@daclubhouse.net> wrote:
> >
> > Greetings CouchDBers!
> >
> > I've been modifying a BERT library to recreate the md5 calc of a
> RevisionID
> > in Java.
> >
> > I haven't tackled attachments yet, however with the awesome help of
> rnewson
> > on the IRC channel, I've succeeded in recreating the md5 for all the
> > documents I've tried so far which includes docs with values of strings,
> big
> > and small integers, lists of big integers, lists of small integers, true,
> > false, null, and objects; however the glaring exception is floats.
> >
> > The {minor_version, 0} format used for floats (A 31 byte string based
> > representation in %.20e format) is dependent on the host environment
> doing
> > the encoding and can't be reliably duplicated in other machines and
> > languages.
> >
> > For instance, here are examples of encoding 3.14159 as %.20e string on
> this
> > laptop:
> > erlang: 3.1415899999999999000e+00  (This is what term_to_binary is using)
> > python: 3.14158999999999988262e+00
> > java:   3.14159000000000000000e+00
> >
> > These minor numerical differences unfortunately make the md5 computation
> > untenable.  And further, it seems that even different OTP versions and
> > different hardware will encode the {minor_version, 0} format slightly
> > differently on different Couch instances (A couple people on IRC shared
> > with me what their OTP produced).
> >
> >
> > To make a long story short and spare folks reading the mind-numbing
> > details, without changing something, replicating the md5 for the revision
> > id of documents with floats just can't be done sanely.
> >
> > As things are now, like I mentioned, even different installations of
> > CouchDB can disagree on the MD5 revision id for the document
> {"pi":3.14159}.
> >
> >
> > So where does this create an issue?
> >
> > It shows up by creating a conflict document during replication when the
> two
> > servers calculated different revision ids for the same document update
> > (which only happens if it was a multi-master update (an update where both
> > sides were updated before replicating -- like separate laptops on
> separate
> > planes each doing the same thing)).
> >
> > If only one side or the other was updated, it doesn't cause a problem.
> >
> > My goal is enabling people to upload documents from multiple server
> > applications using JSON and Couch to handle the replication bits.
> >
> > To give this heterogeneous environment the same multi-master intelligence
> > that Couch has, they need to be able to compute the same revision id that
> > Couch would compute; otherwise documents modified directly in couch could
> > create these kinds of multi-master type conflicts.
> >
> >
> > ----
> >
> > What to do (aside from simply do nothing)?
> >
> > At the least I recommend changing the term_to_binary computation to use
> the
> > {minor_version, 1} option in the rev_id calculation.
> >
> > This changes how floats are encoded to the 64-bit IEEE format.  It became
> > the standard way of encoding floats in OTP 17.0+ and is available as an
> > option all the way back to OTP 11.  As long as it's explicitly provided
> as
> > a requested option in the term_to_binary call, all currently deployed OTP
> > installations for Couch can do it.
> >
> > Doing this normalizes the md5 calculation for floats regardless of the
> OTP
> > platform, and should make it feasible for third party applications to
> > replicate the encoding.
> >
> >
> >
> > I have some other ideas beyond that, but they would require changes to
> the
> > replication protocol to support.
> >
> >
> > ----
> >
> > For anyone interested I'd be happy to share the code I have.  It's still
> a
> > bit rough in the document construction part, but once constructed,
> getting
> > the binary encoding and revision id are each just a single call.
> >
> >
> > Thanks,
> > Mike
>
> --
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message