incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Silent corruption of large numbers
Date Mon, 09 Nov 2009 21:01:18 GMT
On Mon, Nov 9, 2009 at 3:20 PM, Roger Binns <rogerb@rogerbinns.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Brian Candler wrote:
>>> I don't know what the right solution to this is.
>>
>> One option: store the values as strings.
>
> I meant the right solution for CouchDB.  For example it could take a lowest
> common denominator approach and reject all documents whose contents cannot
> be correctly represented by all "supported" view engines.  Or it could
> require view engines to generate an error when they lose data.  Or it could
> just blindly keep soldiering on and make it the problem of the user (current
> approach).

The only answer that sounds sensible to me is to code defensively and
make your own guarantees on data passing. The Couch world is only
going to expand, so data will come and go from a huge multitude of
sources and through any number of intermediaries. While it would be
nice if there were specific semantics for data that can't be
represented as a native data type on the host, there's currently no
code in place to attack that. Same goes for strings as well.

Or in a more specific case, I don't even know how I would get the JS
view engine to raise an error on this condition. Except for taking the
raw string, deserializing, reserializing, and comparing, and even that
is obviously prone to lots of other oddities of escaping and so on and
such forth.

> The biggest performance benefit for me would be if as many copies of the
> Javascript external process were run as I have processor cores.  It is quite
> frustrating watching one core be 100% busy and the other 0% busy while a
> view is being built.

In trunk the computation and index writing are split to take up to two
cores. It'd be theoretically possible to expand the mapping side to
use all available cores but most reports say that the btree writing
process is CPU bound, so any more than a single mapper would just be a
waste of resources.

Unfortunately, parallelizing the btree updates is rather non-trivial.
I've contemplated a few different methods for attempting it, but
nothing that has come of my limited self amusement in that respect.

HTH,
Paul Davis

Mime
View raw message