couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Silent corruption of large numbers
Date Sun, 08 Nov 2009 04:40:45 GMT

> If there are limits then I'd expect them to be enforced in some way,
> typically some sort of exception.  The last thing I would expect is for the
> numbers to be silently corrupted.  If strings over 1,024 bytes were randomly
> mutated would that be acceptable?

Its funny you should mention strings. Because if you dig into the
unicode awesomeness you'll see that its quite unspecified on how
strings can be screwed with when passing through various unicode
implementations. CouchDB actually does try and reject strings (that
are 'valid json') when it knows it can't serialize them.

>> As such, relying on the view engine to error out
> In this case the limit is in the Javascript view engine.  The CouchDB server
> doesn't have a problem.  If the view engine is Python then it won't have the
> problem either.

Exactly my point earlier. It depends on the view server. Python might
be AOK, but the Spidermonkey view server isn't. This is one of those
things that no one is going to agree on. I wish everyone could be as
awesome as Python here, but most implementations are just gonna do
weird things here. And when you contemplate you might have Ruby, Lua,
Java, Clojure, Bash, Lisp, D, Brainfuck, C++, JavaScript, and Erlang
(I'm tired, otherwise that list would be longer) clients, it gets even

>> While not
>> the best answer the only thing I can suggest would be to do as Adam
>> says and store large values as strings and use a Bignum library in the
>> places you need to manipulate such values.
> In this case it is just my test suite that trips over the problem and my
> language (Python) does bignums by default.  I'm not too bothered that there
> is a failure but rather by the manner of the failure - silent mutation.

It sucks greatly I agree. But these sorts of things are hard to coral.
To actually *fix* this you'd have to convince the Spidermonkey team to
fix their handling of large numbers. And I don't see them adding a
native bignum support to the spec anytime soon which is why I can only
suggest to program defensively. It surely doesn't taste good, but at
some point we're still bound by the registers we allocate, and even
our scripting languages haven't yet abstracted numbers away from base

>> Even if we told the view server to error on such values, what would
>> that error look like? Would everyone be unable to pass a doc with a
>> big num through the view server (depending on language)? Things get
>> messy quick.
> I'm old school.  I don't think this kind of mutation is acceptable.  A user
> is sitting behind layers of user interface, CouchDB libraries, JSON
> libraries, HTTP libraries and several other bits of glue.  Silently doing
> this is bad - one day a user will end up with data corruption with serious
> repercussions.
> Using a string analogy what should a view server do if a string is passed in
> that is larger than it wants to handle?  Is silent mutation or corruption ok?

You're not old school. Your worries are extremely well placed. I
completely agree that in a perfect world its absolutely not

Though I would note that this tracks all the way down to things like
atoi() vs strtol(). Sometimes we just ignore edge cases. And seeing as
this is at the 2^64 bit edge case, I'd still recommend to attack the
problem differently if you care about such scenarios. The alternative
is to force all implementations to care, and that (AFAICT) is just not
in the cards right yet.

And I do care about such things. You can find my recent protestations
on unicode handling in JSON at [1].

Paul Davis


View raw message