couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Numbers in JavaScript, Lucene, and FoundationDB
Date Fri, 17 May 2019 03:45:41 GMT
Its late so just a few quick notes here:

Jiffy decodes numbers based on their encoding. I.e., any number that
includes a decimal point or exponent is decoded as a double while any
integer is decoded as an integer or bignum depending on size. While
encoding jiffy will also encode 1.0 as "1.0" and 1 as "1". Generally
speaking this seems to be the least surprising behavior for users.

That said, one particular aspect of JSON and numbers in particular has
always been around money math. Things like "$1 / 3" follow a different
set of rules than arbitrary floating point arithmetic. CouchDB has a
long history of telling users that numbers mostly behave like doubles
given our JavaScript default. Given that, I would expect anyone that
needs a JSON oriented database that has fancy numerical needs to
already be paying special attention to their numeric data.

The FoundationDB collation does definitely present new questions given
that we're forced to implement a strict byte ordering. On the face of
it I'm more than fine forcing everything to doubles and providing the
mentioned warning label. I do know that FoundationDB's tuple layer has
some ¯\_(ツ)_/¯ semantics for "invalid" doubles (-Nan, Nan, -0, other
oddities I'd never heard of). So there may be caveats to mention there
as well. However, for the most part I'd our standard reply of "if you
care about your numbers to the actual bit representation level, use a
string representation" is while maybe not officially official, still
the best advice given JSON.

That of course ignores the fact that `emit(1, 2)` returns a view row
of `("1.0", "2.0")` which Adam noted as another whole big thing. On
that I don't have any amazing thoughts this late at night.

On Thu, May 16, 2019 at 9:39 PM Adam Kocoloski <kocolosk@apache.org> wrote:
>
> Hi all, CouchDB has always had a somewhat complicated relationship with numbers. I’d
like to dig into that a little bit and see if any changes are warranted, or if we can at least
be really clear about exactly how they’re handled going forward.
>
> Most of you are likely aware that JS represents *all* numbers as IEEE 754 double precision
floats. This means that any number in a JSON document with more than 15 significant digits
is at risk of being corrupted when it passes through the JS engine during a view build, for
example. Our current behavior is to let that silent corruption occur and put whatever number
comes out of the JS engine into the view, formatting as a double, int64, or bignum based on
jiffy’s decoding of the JSON output from the JS code.
>
> On the other hand, FoundationDB’s tuple layer encoding is quite a bit more specific.
It has a whole bunch of typecodes for integers of practically arbitrary size (up to 255 bytes),
along with codes for 32 bit and 64 bit floating point numbers. The typecodes control the sorting;
i.e., integers sort separately from floats.
>
> We also have the ever-popular Lucene indexes for folks who build CouchDB with the search
extension. I don’t have all the details for the number handling in that one handy, but it
is another one to keep in mind.
>
> One question that comes up fairly quickly — when a user emits a number as a key in
a view, what do we store in FoundationDB? In order to respect CouchDB’s existing collation
rules we need to use the same typecode for all numbers. Do we simply treat every number as
a double, since they were all coerced into that representation anyway in JS?
>
> But now let’s consider Mango indexes, which don’t suffer from any of JavaScript’s
sloppiness around number handling. If we’re to respect CouchDB’s current collation rules
we still need a common typecode and sortable binary representation across integers and floats.
Do we end up using the IEEE 754 float representation of each number as a “sort key” and
storing the original number alongside it?
>
> I feel like this ends up being a rabbit hole, but one where we owe it to our users to
thoroughly explore and produce a definitive guide :)
>
> Cheers, Adam
>
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message