incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Fixed precision of floating point number not respected in views
Date Wed, 20 Feb 2013 16:35:07 GMT
Thanks Paul, I wrapped this into the docs:

  http://git-wip-us.apache.org/repos/asf/couchdb/commit/bbd93f77

and on the way wrote a guide on how to contribute to the docs for the rest
of you:

  http://git-wip-us.apache.org/repos/asf/couchdb/commit/1f5695dd

Please make plenty of use of this! :)

Best
Jan
-- 


On Feb 20, 2013, at 04:00 , Paul Davis <paul.joseph.davis@gmail.com> wrote:

> Apologies for not being able to express myself earlier this morning.
> I'd been without sleep for entirely too long.
> 
> Robert Newson nails this on the head. The issue here succinctly stated is such:
> 
> Any numbers defined in JSON that contain a decimal point or exponent
> will be passed through the Erlang VM's idea of the "double" data type.
> Any numbers that are used in views will pass through the views idea of
> a number (the common JavaScript case means even integers pass through
> a double due to JavaScript's definition of a number).
> 
> (This is roughly a "no matter what" proposition until we decide to
> massively overhaul a significant portion of CouchDB internals to not
> interpret JSON into an internal representation which is not impossible
> but not likely for quite some time).
> 
> What people are discussing in this particular thread is how we encode
> those numbers after they have been passed through some internal
> representation. While it can be a bit surprising and a number of
> people have said "but couchdb changes my data!" its really not true
> (with a caveat). What happens is CouchDB is changing the textual
> representation of the result of decoding what it was given into some
> numerical format. In most cases this is an IEEE-754 double precision
> floating point number which is exactly what almost all other languages
> use as well.
> 
> What CouchDB does a bit differently than other languages is that it
> does not attempt to pretty print the resulting output to use the
> shortest number of characters. For instance, this is why we have this
> relationship:
> 
>> ejson:encode(ejson:decode(<<"1.1">>)).
> <<"1.1000000000000000888">>
> 
> What people are missing here is that internally those two formats
> decode into the same IEEE-754 representation. And more importantly, it
> will decode into a fairly close representation when passed through all
> major parsers that I know about.
> 
> While we've only been discussing cases where the textual
> representation changes another important case is when an input value
> is contains more precision than can actually represented in a double.
> (You could argue that this case is actually "losing" data if you don't
> accept that numbers are stored in doubles).
> 
> Here's a log for a couple of the more common JSON libraries I happen
> to have on my machine:
> 
> Spidermonkey
> 
> $ js -h 2>&1 | head -n 1
> JavaScript-C 1.8.5 2011-03-31
> $ js
> js> JSON.stringify(JSON.parse("1.01234567890123456789012345678901234567890"))
> "1.0123456789012346"
> js> var f = JSON.stringify(JSON.parse("1.01234567890123456789012345678901234567890"))
> js> JSON.stringify(JSON.parse(f))
> "1.0123456789012346"
> 
> Node
> 
> $ node -v
> v0.6.15
> $ node
>> JSON.stringify(JSON.parse("1.01234567890123456789012345678901234567890"))
> '1.0123456789012346'
>> var f = JSON.stringify(JSON.parse("1.01234567890123456789012345678901234567890"))
> undefined
>> JSON.stringify(JSON.parse(f))
> '1.0123456789012346'
> 
> $ python
> Python 2.7.2 (default, Jun 20 2012, 16:23:33)
> [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import json
>>>> json.dumps(json.loads("1.01234567890123456789012345678901234567890"))
> '1.0123456789012346'
>>>> f = json.dumps(json.loads("1.01234567890123456789012345678901234567890"))
>>>> json.dumps(json.loads(f))
> '1.0123456789012346'
> 
> Ruby
> 
> An small aside on Ruby, it requires a top level object or array, so I just
> wrapped the value. Should be obvious it doesn't affect the result of
> parsing the number though.
> 
> $ irb --version
> irb 0.9.5(05/04/13)
>>> require 'JSON'
> => true
>>> JSON.dump(JSON.load("[1.01234567890123456789012345678901234567890]"))
> => "[1.01234567890123]"
>>> f = JSON.dump(JSON.load("[1.01234567890123456789012345678901234567890]"))
> => "[1.01234567890123]"
>>> JSON.dump(JSON.load(f))
> => "[1.01234567890123]"
> 
> 
> # ejson (CouchDB's current parser) at CouchDB sha 168a663b
> 
> $ ./utils/run -i
> Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2]
> [async-threads:4] [hipe] [kernel-poll:true]
> 
> Eshell V5.8.5  (abort with ^G)
> 1> ejson:encode(ejson:decode(<<"1.01234567890123456789012345678901234567890">>)).
> <<"1.0123456789012346135">>
> 2> F = ejson:encode(ejson:decode(<<"1.01234567890123456789012345678901234567890">>)).
> <<"1.0123456789012346135">>
> 3> ejson:encode(ejson:decode(F)).
> <<"1.0123456789012346135">>
> 
> 
> As you can see they all pretty much behave the same except for Ruby
> actually does appear to be losing some precision over the other
> libraries.
> 
> The astute observer will notice that ejson (the CouchDB JSON library)
> reported an extra three digits. While its tempting to think that this
> is due to some internal difference, its just a more specific case of
> the 1.1 input as described above.
> 
> The important point to realize here is that a double can only hold a
> finite number of values. What we're doing here is generating a string
> that when passed through the "standard" floating point parsing
> algorithms (ie, strtod) will result in the same bit pattern in memory
> as we started with. Or, slightly different, the bytes in a JSON
> serialized number are chosen such that they refer to a single specific
> value that a double can represent.
> 
> The game that other JSON libraries are playing is merely:
> 
> "How few characters do I have to use to select this specific value for a double"
> 
> And that game has lots and lots of subtle details that are difficult
> to duplicate in C without a significant amount of effort (it took
> Python over a year to get it sorted with their fancy build systems
> that automatically run on a number of different architectures).
> 
> Hopefully I've shown that CouchDB is not doing anything "funky" by
> changing input. Its behaving the same as any other common JSON library
> does, its just not pretty printing its output.
> 
> On the other hand, if you actually are in a position where an IEEE-754
> double is not a satisfactory datatype for your numbers, then the
> answer as has been stated is to not pass your numbers through this
> representation. In JSON this is accomplished by encoding them as a
> string or by using integer types (although integer types can still
> bite you if you use a platform that has a different integer
> representation than normal, ie, JavaScript).
> 
> Also, if anyone is really interested in changing this behavior, I'm
> all ears for contributions to jiffy (which is theoretically going to
> replace ejson when I get around to updating the build system). The
> places I've looked for inspiration are TCL and Python. If you know a
> decent implementation of this float printing algorithm give me a
> holler.
> 
> On Tue, Feb 19, 2013 at 3:58 PM, Tibor Gemes <tibber@gmail.com> wrote:
>> It's against best practices to use floats for representing money. If you
>> count pennies, then interpret the amount in pennies with int.
>> If this inconsistency is unacceptable, then you must use int. You should
>> use float only if this does not matter.
>> T
>> 2013.02.19. 22:48, "Robert Newson" <rnewson@apache.org> ezt írta:
>> 
>>> I agree entirely with your last statement (I filed
>>> https://issues.apache.org/jira/browse/COUCHDB-1410 for exactly that
>>> reason).
>>> 
>>> However, I've been convinced that it cannot be done with
>>> Javascript/JSON's meaning of number, hence the suggestion to protect
>>> your values inside strings (which will not be altered or interpreted)
>>> and use math functions that operate on them (the various bignum.js
>>> libraries, for example). Another way to think of this is by
>>> comparison; if you would be happy, in Java, to exclusively use
>>> doubles, you'd be fine here. An important place where that is not
>>> acceptable is money (and, related, currency). You can't invent
>>> pennies.
>>> 
>>> I'm +1 on including such a feature in a future release of CouchDB, but
>>> I don't think I got consensus on the idea so far (since it can be done
>>> today without such an extension).
>>> 
>>> B.
>>> 
>>> 
>>> On 19 February 2013 21:39, Luca Morandini <lmorandini@ieee.org> wrote:
>>>> On 02/20/2013 08:23 AM, Robert Newson wrote:
>>>>> 
>>>>> 
>>>>> The numbers are not being changed, you are simply being exposed to the
>>>>> truth. :)
>>>> 
>>>> 
>>>> Nicely and concisely put, though it must be noted that Node.js -for
>>>> instance- keeps hiding the truth, hence there is a bit of inconsistency.
>>>> 
>>>> But what if I rely on that low-fidelity representation ?
>>>> This is a DBMS, people expects to get exactly what they put into it.
>>>> 
>>>> 
>>>> Regards,
>>>> 
>>>> Luca Morandini
>>>> Data Architect - AURIN project
>>>> Department of Computing and Information Systems
>>>> University of Melbourne
>>>> Tel. +61 03 903 58 380
>>>> Skype: lmorandini
>>>> 
>>> 


Mime
View raw message