Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 50862 invoked from network); 8 Nov 2009 04:41:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Nov 2009 04:41:41 -0000 Received: (qmail 88339 invoked by uid 500); 8 Nov 2009 04:41:39 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 88186 invoked by uid 500); 8 Nov 2009 04:41:37 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 88176 invoked by uid 99); 8 Nov 2009 04:41:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Nov 2009 04:41:36 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.210.176 as permitted sender) Received: from [209.85.210.176] (HELO mail-yx0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Nov 2009 04:41:26 +0000 Received: by yxe6 with SMTP id 6so2089251yxe.13 for ; Sat, 07 Nov 2009 20:41:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=1nQnqqiJrWNOB+qh30axB+l4BfDTzz7nj8/m1uVgAXM=; b=ZfC5yyhGKBwy3xm7beulke9ikvGp3W4sH50Rgb0sEUHwxKORJ1PV8RVQ9igLl1MC5i ElM9dhoHSQ1JFSZ7I+DcrN0MZD0RbscWCSXnRrTuL5t4aTd9gBp3c3QUKK/uTzdQ8lXJ vgAw0ZyxqjD3j8DQZjm8QqdOkEQAd9tEXSyOw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=WcWUaFv1V7QSxzC9BA3B0A88useEnYI9J3QhzK/ESHwYjQ/FTBuKtaQdnX6dcuhMzY ay/zag6YFBCdWGHhzaKt8Ae8FGSjsAPL5prQ50RDWBLV+NYwKIvDjyyfsHiiMblvfYo7 1kIC95+Kc+8Mj7Ym6euYVFqtBUEZwfzqyuURY= MIME-Version: 1.0 Received: by 10.100.237.9 with SMTP id k9mr5716374anh.139.1257655265135; Sat, 07 Nov 2009 20:41:05 -0800 (PST) In-Reply-To: <4AF64514.8090401@rogerbinns.com> References: <4AF5FB9E.70300@rogerbinns.com> <4AF622D5.9070708@rogerbinns.com> <4AF64514.8090401@rogerbinns.com> From: Paul Davis Date: Sat, 7 Nov 2009 23:40:45 -0500 Message-ID: Subject: Re: Silent corruption of large numbers To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Roger, > If there are limits then I'd expect them to be enforced in some way, > typically some sort of exception. =A0The last thing I would expect is for= the > numbers to be silently corrupted. =A0If strings over 1,024 bytes were ran= domly > mutated would that be acceptable? Its funny you should mention strings. Because if you dig into the unicode awesomeness you'll see that its quite unspecified on how strings can be screwed with when passing through various unicode implementations. CouchDB actually does try and reject strings (that are 'valid json') when it knows it can't serialize them. >> As such, relying on the view engine to error out > > In this case the limit is in the Javascript view engine. =A0The CouchDB s= erver > doesn't have a problem. =A0If the view engine is Python then it won't hav= e the > problem either. Exactly my point earlier. It depends on the view server. Python might be AOK, but the Spidermonkey view server isn't. This is one of those things that no one is going to agree on. I wish everyone could be as awesome as Python here, but most implementations are just gonna do weird things here. And when you contemplate you might have Ruby, Lua, Java, Clojure, Bash, Lisp, D, Brainfuck, C++, JavaScript, and Erlang (I'm tired, otherwise that list would be longer) clients, it gets even weirder. >> While not >> the best answer the only thing I can suggest would be to do as Adam >> says and store large values as strings and use a Bignum library in the >> places you need to manipulate such values. > > In this case it is just my test suite that trips over the problem and my > language (Python) does bignums by default. =A0I'm not too bothered that t= here > is a failure but rather by the manner of the failure - silent mutation. It sucks greatly I agree. But these sorts of things are hard to coral. To actually *fix* this you'd have to convince the Spidermonkey team to fix their handling of large numbers. And I don't see them adding a native bignum support to the spec anytime soon which is why I can only suggest to program defensively. It surely doesn't taste good, but at some point we're still bound by the registers we allocate, and even our scripting languages haven't yet abstracted numbers away from base 2. >> Even if we told the view server to error on such values, what would >> that error look like? Would everyone be unable to pass a doc with a >> big num through the view server (depending on language)? Things get >> messy quick. > > I'm old school. =A0I don't think this kind of mutation is acceptable. =A0= A user > is sitting behind layers of user interface, CouchDB libraries, JSON > libraries, HTTP libraries and several other bits of glue. =A0Silently doi= ng > this is bad - one day a user will end up with data corruption with seriou= s > repercussions. > > Using a string analogy what should a view server do if a string is passed= in > that is larger than it wants to handle? =A0Is silent mutation or corrupti= on ok? You're not old school. Your worries are extremely well placed. I completely agree that in a perfect world its absolutely not acceptable. Though I would note that this tracks all the way down to things like atoi() vs strtol(). Sometimes we just ignore edge cases. And seeing as this is at the 2^64 bit edge case, I'd still recommend to attack the problem differently if you care about such scenarios. The alternative is to force all implementations to care, and that (AFAICT) is just not in the cards right yet. And I do care about such things. You can find my recent protestations on unicode handling in JSON at [1]. HTH, Paul Davis [1] https://mail.mozilla.org/pipermail/es5-discuss/2009-October/003383.html