couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Fixed precision of floating point number not respected in views
Date Wed, 20 Feb 2013 03:00:39 GMT
Apologies for not being able to express myself earlier this morning.
I'd been without sleep for entirely too long.

Robert Newson nails this on the head. The issue here succinctly stated is such:

Any numbers defined in JSON that contain a decimal point or exponent
will be passed through the Erlang VM's idea of the "double" data type.
Any numbers that are used in views will pass through the views idea of
a number (the common JavaScript case means even integers pass through
a double due to JavaScript's definition of a number).

(This is roughly a "no matter what" proposition until we decide to
massively overhaul a significant portion of CouchDB internals to not
interpret JSON into an internal representation which is not impossible
but not likely for quite some time).

What people are discussing in this particular thread is how we encode
those numbers after they have been passed through some internal
representation. While it can be a bit surprising and a number of
people have said "but couchdb changes my data!" its really not true
(with a caveat). What happens is CouchDB is changing the textual
representation of the result of decoding what it was given into some
numerical format. In most cases this is an IEEE-754 double precision
floating point number which is exactly what almost all other languages
use as well.

What CouchDB does a bit differently than other languages is that it
does not attempt to pretty print the resulting output to use the
shortest number of characters. For instance, this is why we have this
relationship:

> ejson:encode(ejson:decode(<<"1.1">>)).
<<"1.1000000000000000888">>

What people are missing here is that internally those two formats
decode into the same IEEE-754 representation. And more importantly, it
will decode into a fairly close representation when passed through all
major parsers that I know about.

While we've only been discussing cases where the textual
representation changes another important case is when an input value
is contains more precision than can actually represented in a double.
(You could argue that this case is actually "losing" data if you don't
accept that numbers are stored in doubles).

Here's a log for a couple of the more common JSON libraries I happen
to have on my machine:

Spidermonkey

$ js -h 2>&1 | head -n 1
JavaScript-C 1.8.5 2011-03-31
$ js
js> JSON.stringify(JSON.parse("1.01234567890123456789012345678901234567890"))
"1.0123456789012346"
js> var f = JSON.stringify(JSON.parse("1.01234567890123456789012345678901234567890"))
js> JSON.stringify(JSON.parse(f))
"1.0123456789012346"

Node

$ node -v
v0.6.15
$ node
> JSON.stringify(JSON.parse("1.01234567890123456789012345678901234567890"))
'1.0123456789012346'
> var f = JSON.stringify(JSON.parse("1.01234567890123456789012345678901234567890"))
undefined
> JSON.stringify(JSON.parse(f))
'1.0123456789012346'

$ python
Python 2.7.2 (default, Jun 20 2012, 16:23:33)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> json.dumps(json.loads("1.01234567890123456789012345678901234567890"))
'1.0123456789012346'
>>> f = json.dumps(json.loads("1.01234567890123456789012345678901234567890"))
>>> json.dumps(json.loads(f))
'1.0123456789012346'

Ruby

An small aside on Ruby, it requires a top level object or array, so I just
wrapped the value. Should be obvious it doesn't affect the result of
parsing the number though.

$ irb --version
irb 0.9.5(05/04/13)
>> require 'JSON'
=> true
>> JSON.dump(JSON.load("[1.01234567890123456789012345678901234567890]"))
=> "[1.01234567890123]"
>> f = JSON.dump(JSON.load("[1.01234567890123456789012345678901234567890]"))
=> "[1.01234567890123]"
>> JSON.dump(JSON.load(f))
=> "[1.01234567890123]"


# ejson (CouchDB's current parser) at CouchDB sha 168a663b

$ ./utils/run -i
Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2]
[async-threads:4] [hipe] [kernel-poll:true]

Eshell V5.8.5  (abort with ^G)
1> ejson:encode(ejson:decode(<<"1.01234567890123456789012345678901234567890">>)).
<<"1.0123456789012346135">>
2> F = ejson:encode(ejson:decode(<<"1.01234567890123456789012345678901234567890">>)).
<<"1.0123456789012346135">>
3> ejson:encode(ejson:decode(F)).
<<"1.0123456789012346135">>


As you can see they all pretty much behave the same except for Ruby
actually does appear to be losing some precision over the other
libraries.

The astute observer will notice that ejson (the CouchDB JSON library)
reported an extra three digits. While its tempting to think that this
is due to some internal difference, its just a more specific case of
the 1.1 input as described above.

The important point to realize here is that a double can only hold a
finite number of values. What we're doing here is generating a string
that when passed through the "standard" floating point parsing
algorithms (ie, strtod) will result in the same bit pattern in memory
as we started with. Or, slightly different, the bytes in a JSON
serialized number are chosen such that they refer to a single specific
value that a double can represent.

The game that other JSON libraries are playing is merely:

"How few characters do I have to use to select this specific value for a double"

And that game has lots and lots of subtle details that are difficult
to duplicate in C without a significant amount of effort (it took
Python over a year to get it sorted with their fancy build systems
that automatically run on a number of different architectures).

Hopefully I've shown that CouchDB is not doing anything "funky" by
changing input. Its behaving the same as any other common JSON library
does, its just not pretty printing its output.

On the other hand, if you actually are in a position where an IEEE-754
double is not a satisfactory datatype for your numbers, then the
answer as has been stated is to not pass your numbers through this
representation. In JSON this is accomplished by encoding them as a
string or by using integer types (although integer types can still
bite you if you use a platform that has a different integer
representation than normal, ie, JavaScript).

Also, if anyone is really interested in changing this behavior, I'm
all ears for contributions to jiffy (which is theoretically going to
replace ejson when I get around to updating the build system). The
places I've looked for inspiration are TCL and Python. If you know a
decent implementation of this float printing algorithm give me a
holler.

On Tue, Feb 19, 2013 at 3:58 PM, Tibor Gemes <tibber@gmail.com> wrote:
> It's against best practices to use floats for representing money. If you
> count pennies, then interpret the amount in pennies with int.
> If this inconsistency is unacceptable, then you must use int. You should
> use float only if this does not matter.
> T
> 2013.02.19. 22:48, "Robert Newson" <rnewson@apache.org> ezt írta:
>
>> I agree entirely with your last statement (I filed
>> https://issues.apache.org/jira/browse/COUCHDB-1410 for exactly that
>> reason).
>>
>> However, I've been convinced that it cannot be done with
>> Javascript/JSON's meaning of number, hence the suggestion to protect
>> your values inside strings (which will not be altered or interpreted)
>> and use math functions that operate on them (the various bignum.js
>> libraries, for example). Another way to think of this is by
>> comparison; if you would be happy, in Java, to exclusively use
>> doubles, you'd be fine here. An important place where that is not
>> acceptable is money (and, related, currency). You can't invent
>> pennies.
>>
>> I'm +1 on including such a feature in a future release of CouchDB, but
>> I don't think I got consensus on the idea so far (since it can be done
>> today without such an extension).
>>
>> B.
>>
>>
>> On 19 February 2013 21:39, Luca Morandini <lmorandini@ieee.org> wrote:
>> > On 02/20/2013 08:23 AM, Robert Newson wrote:
>> >>
>> >>
>> >> The numbers are not being changed, you are simply being exposed to the
>> >> truth. :)
>> >
>> >
>> > Nicely and concisely put, though it must be noted that Node.js -for
>> > instance- keeps hiding the truth, hence there is a bit of inconsistency.
>> >
>> > But what if I rely on that low-fidelity representation ?
>> > This is a DBMS, people expects to get exactly what they put into it.
>> >
>> >
>> > Regards,
>> >
>> > Luca Morandini
>> > Data Architect - AURIN project
>> > Department of Computing and Information Systems
>> > University of Melbourne
>> > Tel. +61 03 903 58 380
>> > Skype: lmorandini
>> >
>>

Mime
View raw message