incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Binns <rog...@rogerbinns.com>
Subject Re: Silent corruption of large numbers
Date Wed, 11 Nov 2009 00:23:07 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brian Candler wrote:
> But would you accept 12345678901234567890.0? (.0 indicating float)

As a developer past practise has trained me that integers are exact and
floating point is approximate (also "fast" and "slow" respectively).  Other
than some older BASICs, Javascript is the first time in ages to come across
a language that doesn't have integers, and representing everything as float.

I looked up a few Javascript tutorials and didn't find a single one stating
that all numbers are stored as float.  In most cases they deliberately
distinguish between integers and floating point as two different types.
Integers can have leading 0x/0 to specify hex/octal whereas floating point
cannot is why they seem to make the distinction.

You can even see this in Javascript documentation itself.  It will say that
parseInt returns an integer and parseFloat returns a floating point again
implying two different types.

> It's unfortunately a fact of life that floats are non-exact quantities,

That is not the issue.  The issue is that Javascript doesn't actually have
an integer type despite documentation and common practise in other
languages, and that floating point accuracy rules also apply to Javascript
integers.  I am also willing to bet that this is not widely known.

> Even if you could raise an exception, there currently isn't a good mechanism
> to handle it. If a map function bombs out, all you get is an error in the
> log and that document disappears from the view entirely. There's nothing
> reported back to the view user in any form.

I assume the log isn't available via REST either which means only the
couchdb administrator can find out this has happened.  Perhaps view results
need to include an "errors" key with an integer of how many occurred in
generating the view.

> I think you're safe enough here - couchdb *shouldn't* allow storage of
> strings which are not valid unicode. 

Some people speculated that the JSON spec allowed you to send those
"invalid" strings though which means CouchDB is not JSON compliant when
looking at some tiny pedantic corner of the spec.

>> And what about view engines whose internals have
>> a length limit on strings?
> 
> I'm not aware of any such view server - although if you stick a 4GB single
> JSON document into couchdb, you're asking for trouble just because it will
> all sit in RAM at once. 

I'll guarantee that they do have length limit on strings.  For example in a
32 bit implementation there is insufficient address space to have that 4GB
string in memory.  And when compiled for 64 bit they may be using int
instead of size_t for items such as string lengths or number of items in a list.

And it won't cause me any trouble.  My machine has far more than 4GB of RAM
:-)  We go through never ending upgrades in what is considered a normal
size.  Yesterday's excessive is today's normal.

> but you need to be sensible over the main JSON document.

Translation: there are limits :)

> I'm not quite clear on that. If your code is a "library" written in language
> X, and it talks to a couchdb view, the view can be written in language Y and
> the client is no wiser.

Correct, unless language Y happens to silently "corrupt" integers.  Or maybe
screws with strings.  Or has low limits.

> If it is essential for your application that large
> integers are handled exactly, then you need to choose language Y
> appropriately.

Language Y happens to be Javascript and the view server shipped by default
with CouchDB.  I'd be happy to point my users at a page listing its
constraints and quirks, along with view servers in Python, Ruby, Erlang etc
but no such page or test/compliance test suite exists (yet).

Is it even possible for a client to tell what view servers are available?
At the moment I generate some Javascript on the fly and could generate a
different language but need to know it is available.  (And then that doubles
the amount of testing I have to do ... )

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkr6A+oACgkQmOOfHg372QSH+wCgkVroeyJRR89SLKUqw1TeemkJ
NrIAn0/EzNQunUtVLeV8ay2FWq9bxUJ7
=32a2
-----END PGP SIGNATURE-----

Mime
View raw message