incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: Silent corruption of large numbers
Date Wed, 11 Nov 2009 01:44:19 GMT
On Tue, Nov 10, 2009 at 4:23 PM, Roger Binns <rogerb@rogerbinns.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Brian Candler wrote:
>> But would you accept 12345678901234567890.0? (.0 indicating float)
>
> As a developer past practise has trained me that integers are exact and
> floating point is approximate (also "fast" and "slow" respectively).  Other
> than some older BASICs, Javascript is the first time in ages to come across
> a language that doesn't have integers, and representing everything as float.
>
> I looked up a few Javascript tutorials and didn't find a single one stating
> that all numbers are stored as float.  In most cases they deliberately
> distinguish between integers and floating point as two different types.
> Integers can have leading 0x/0 to specify hex/octal whereas floating point
> cannot is why they seem to make the distinction.
>
> You can even see this in Javascript documentation itself.  It will say that
> parseInt returns an integer and parseFloat returns a floating point again
> implying two different types.
>
>> It's unfortunately a fact of life that floats are non-exact quantities,
>
> That is not the issue.  The issue is that Javascript doesn't actually have
> an integer type despite documentation and common practise in other
> languages, and that floating point accuracy rules also apply to Javascript
> integers.  I am also willing to bet that this is not widely known.
>
>> Even if you could raise an exception, there currently isn't a good mechanism
>> to handle it. If a map function bombs out, all you get is an error in the
>> log and that document disappears from the view entirely. There's nothing
>> reported back to the view user in any form.
>
> I assume the log isn't available via REST either which means only the
> couchdb administrator can find out this has happened.  Perhaps view results
> need to include an "errors" key with an integer of how many occurred in
> generating the view.
>
>> I think you're safe enough here - couchdb *shouldn't* allow storage of
>> strings which are not valid unicode.
>
> Some people speculated that the JSON spec allowed you to send those
> "invalid" strings though which means CouchDB is not JSON compliant when
> looking at some tiny pedantic corner of the spec.
>
>>> And what about view engines whose internals have
>>> a length limit on strings?
>>
>> I'm not aware of any such view server - although if you stick a 4GB single
>> JSON document into couchdb, you're asking for trouble just because it will
>> all sit in RAM at once.
>
> I'll guarantee that they do have length limit on strings.  For example in a
> 32 bit implementation there is insufficient address space to have that 4GB
> string in memory.  And when compiled for 64 bit they may be using int
> instead of size_t for items such as string lengths or number of items in a list.
>
> And it won't cause me any trouble.  My machine has far more than 4GB of RAM
> :-)  We go through never ending upgrades in what is considered a normal
> size.  Yesterday's excessive is today's normal.
>
>> but you need to be sensible over the main JSON document.
>
> Translation: there are limits :)
>
>> I'm not quite clear on that. If your code is a "library" written in language
>> X, and it talks to a couchdb view, the view can be written in language Y and
>> the client is no wiser.
>
> Correct, unless language Y happens to silently "corrupt" integers.  Or maybe
> screws with strings.  Or has low limits.
>
>> If it is essential for your application that large
>> integers are handled exactly, then you need to choose language Y
>> appropriately.
>
> Language Y happens to be Javascript and the view server shipped by default
> with CouchDB.  I'd be happy to point my users at a page listing its
> constraints and quirks, along with view servers in Python, Ruby, Erlang etc
> but no such page or test/compliance test suite exists (yet).
>
> Is it even possible for a client to tell what view servers are available?
> At the moment I generate some Javascript on the fly and could generate a
> different language but need to know it is available.  (And then that doubles
> the amount of testing I have to do ... )

Yes, you can use the _config API to see the available query servers
and set up new ones.

JSON is suitable for high-precision applications when you know the
intermediaries are lossless. CouchDB document number handling uses
Erlang's number capability, and it seems good so far.

It's too bad Spidermonkey isn't handling numbers at higher precision.
Does anyone know if Tracemonkey, V8, etc have better big number
support? Maybe we should take this up on dev@.

I encourage you to continue to contribute to discussion but this seems
like something patches would answer really swiftly.

Chris

-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message