incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Candler <B.Cand...@pobox.com>
Subject Re: Silent corruption of large numbers
Date Tue, 10 Nov 2009 20:05:36 GMT
On Mon, Nov 09, 2009 at 12:20:11PM -0800, Roger Binns wrote:
> >> I don't know what the right solution to this is.
> > 
> > One option: store the values as strings.
> 
> I meant the right solution for CouchDB.  For example it could take a lowest
> common denominator approach and reject all documents whose contents cannot
> be correctly represented by all "supported" view engines.

I wouldn't want that. CouchDB is "clean" for arbitrarily-sized integers all
the way from HTTP through disk to view server. Just because Javascript is
borked, I don't want to be prevented from using Ruby or Erlang to map
numbers natively.

But in any case, I'm not sure what you're saying makes sense here. For
example: I believe that you would want to raise an error on
12345678901234567890 because that can't be represented exactly by the
Javascript view server.

  js> 12345678901234567890
  12345678901234567000

But would you accept 12345678901234567890.0? (.0 indicating float)

Would you accept 12345678901234567000.0 but not 12345678901234567890.0 ?

Would you accept 1.2345678901234567000 but not 1.234567890123456789 ?

It's unfortunately a fact of life that floats are non-exact quantities,
can't always be converted exactly to and from decimal, and even simple
arithmetic like 1.0-1.0=0 doesn't necessarily hold true. Would you want to
exclude floating point values entirely?

> Or it could
> require view engines to generate an error when they lose data.

Even if you could raise an exception, there currently isn't a good mechanism
to handle it. If a map function bombs out, all you get is an error in the
log and that document disappears from the view entirely. There's nothing
reported back to the view user in any form.

> Or it could
> just blindly keep soldiering on and make it the problem of the user (current
> approach).

Given that JSON spec says that numeric limits are implementation-defined,
and that ECMA-262 (from which JSON is inspired) defines all numbers as
double-precision 64-bit format IEEE 754, this seems reasonable to me.

> Although this is an issue with integers it can also be the case with
> strings.  An earlier post linked to JSON discussion where it wasn't exactly
> clear if a JSON string actually has to be valid or is just a list of
> unsigned 16 bit integers.

I think you're safe enough here - couchdb *shouldn't* allow storage of
strings which are not valid unicode. Or at least, people have found that if
you do manage to store one, it screws up much more than just the view engine
:-)

> And what about view engines whose internals have
> a length limit on strongs?

I'm not aware of any such view server - although if you stick a 4GB single
JSON document into couchdb, you're asking for trouble just because it will
all sit in RAM at once. Couchdb takes care to handle attachments in a
streaming manner, but you need to be sensible over the main JSON document.

> > Another option: use the Erlang view server,
> 
> Not an option as my code is a library for others to use.

I'm not quite clear on that. If your code is a "library" written in language
X, and it talks to a couchdb view, the view can be written in language Y and
the client is no wiser. If it is essential for your application that large
integers are handled exactly, then you need to choose language Y
appropriately.

I'd understand if your library is aimed at people who are unable to make any
changes to couchdb's config to enable different view languages though.

Regards,

Brian.

Mime
View raw message