couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garren Smith <gar...@apache.org>
Subject Re: Numbers in JavaScript, Lucene, and FoundationDB
Date Fri, 17 May 2019 10:52:02 GMT
On Fri, May 17, 2019 at 6:04 AM Paul Davis <paul.joseph.davis@gmail.com>
wrote:

> Its late so just a few quick notes here:
>
> Jiffy decodes numbers based on their encoding. I.e., any number that
> includes a decimal point or exponent is decoded as a double while any
> integer is decoded as an integer or bignum depending on size. While
> encoding jiffy will also encode 1.0 as "1.0" and 1 as "1". Generally
> speaking this seems to be the least surprising behavior for users.
>
> That said, one particular aspect of JSON and numbers in particular has
> always been around money math. Things like "$1 / 3" follow a different
> set of rules than arbitrary floating point arithmetic. CouchDB has a
> long history of telling users that numbers mostly behave like doubles
> given our JavaScript default. Given that, I would expect anyone that
> needs a JSON oriented database that has fancy numerical needs to
> already be paying special attention to their numeric data.
>
> The FoundationDB collation does definitely present new questions given
> that we're forced to implement a strict byte ordering. On the face of
> it I'm more than fine forcing everything to doubles and providing the
> mentioned warning label. I do know that FoundationDB's tuple layer has
> some ¯\_(ツ)_/¯ semantics for "invalid" doubles (-Nan, Nan, -0, other
> oddities I'd never heard of). So there may be caveats to mention there
> as well. However, for the most part I'd our standard reply of "if you
> care about your numbers to the actual bit representation level, use a
> string representation" is while maybe not officially official, still
> the best advice given JSON.
>
> That of course ignores the fact that `emit(1, 2)` returns a view row
> of `("1.0", "2.0")` which Adam noted as another whole big thing. On
> that I don't have any amazing thoughts this late at night.
>

To get around the ("1.0", "2.0"), we could look at encoding the keys to get
the correct collation in FDB but then also storing the unencoded keys to
return to the user. We could possible store the keys in the value but that
then reduces the amount of map values that can be stored or as a separate
row in FDB.  This would fix this problem and also help with storing any
strings for a key.



On Thu, May 16, 2019 at 9:39 PM Adam Kocoloski <kocolosk@apache.org> wrote:
> >
> > Hi all, CouchDB has always had a somewhat complicated relationship with
> numbers. I’d like to dig into that a little bit and see if any changes are
> warranted, or if we can at least be really clear about exactly how they’re
> handled going forward.
> >
> > Most of you are likely aware that JS represents *all* numbers as IEEE
> 754 double precision floats. This means that any number in a JSON document
> with more than 15 significant digits is at risk of being corrupted when it
> passes through the JS engine during a view build, for example. Our current
> behavior is to let that silent corruption occur and put whatever number
> comes out of the JS engine into the view, formatting as a double, int64, or
> bignum based on jiffy’s decoding of the JSON output from the JS code.
> >
> > On the other hand, FoundationDB’s tuple layer encoding is quite a bit
> more specific. It has a whole bunch of typecodes for integers of
> practically arbitrary size (up to 255 bytes), along with codes for 32 bit
> and 64 bit floating point numbers. The typecodes control the sorting; i.e.,
> integers sort separately from floats.
> >
> > We also have the ever-popular Lucene indexes for folks who build CouchDB
> with the search extension. I don’t have all the details for the number
> handling in that one handy, but it is another one to keep in mind.
> >
> > One question that comes up fairly quickly — when a user emits a number
> as a key in a view, what do we store in FoundationDB? In order to respect
> CouchDB’s existing collation rules we need to use the same typecode for all
> numbers. Do we simply treat every number as a double, since they were all
> coerced into that representation anyway in JS?
> >
> > But now let’s consider Mango indexes, which don’t suffer from any of
> JavaScript’s sloppiness around number handling. If we’re to respect
> CouchDB’s current collation rules we still need a common typecode and
> sortable binary representation across integers and floats. Do we end up
> using the IEEE 754 float representation of each number as a “sort key” and
> storing the original number alongside it?
> >
> > I feel like this ends up being a rabbit hole, but one where we owe it to
> our users to thoroughly explore and produce a definitive guide :)
> >
> > Cheers, Adam
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message