couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikeal Rogers <>
Subject Re: Proposal for changes in view server/protocol
Date Mon, 02 Aug 2010 21:34:32 GMT
> For the first point about CommonJS modules in Map/Reduce views I'd say
> the goal is fine, but I don't understand how or why you'd want that
> hash to happen in JavaScript. Unless I'm mistaken, aren't the import
> statements executable JS? As in, is there any requirement that you
> couldn't import a module inside your map function? In which case, JS
> can't really hash all imported modules until after all possible code
> paths have been traced?
> I think a better answer would be to allow commonjs modules, but only
> in some name-space of the design document. (IIRC, the other functions
> can pull from anywhere, but that would make all design doc updates
> trigger view regeneration) Then Erlang just loads this namespace and
> anything that could be imported is included in the hash some how (hash
> of sorted hashes or some such).

This is an interesting idea and I think I like it more than my original
proposal. My fear with the original proposal was that it might be opaque to
most users what will invalidate their views if we start doing fancy
invalidation on modules they use. If we re-scope or restrict the module
support to an attribute that would make it very clear that changes to those
modules will invalidate the view.

> Batching docs across the I/O might not give you as much of a
> performance improvement as you'd think. There's a pretty nasty time
> explosion on parsing larger JSON documents in some of the various
> parsers I've tried. I've noticed this on various Pure erlang parsers,
> but I wouldn't be suprised if the the json.js suffered as well. And in
> this, I mean, that parsing a one megabyte document might be quite a
> bit slower than parsing many smaller documents. So simply wrapping
> things in an array could be bad.

The new native C parser in JavaScript is fine with anything this size and I
believe Damien just wrote an evented JSON parser which should make this more
acceptable on the client side. One good idea I think jchris has was instead
of having a number of documents threshold was to have a byte length
restriction on the batch we send to the view server.

The I/O time for large amounts of small documents is larger than you would
expect. I ran some tests a while back and there was more time spent in stdio
for simple map/reduce operations than there was in processing on the view

Of course the most time spent on view generation is still writing to the
btree but that performance has already increased quite a bit so we're
looking for other places we can optimize.

> An alternative that I haven't seen anywhere else in this thread was an
> idea to tag every message passed to the view engine with a uuid. Then
> people can do all sorts of fancy things with the view engine like
> async processing and so on and such forth. The downside being that the
> saturday afternoon implementation of the view engine in language X now
> takes both saturday and sunday afternoon.

So, this gets dicey really fast. I want the external process protocol to go
non-blocking and support this uuid style communication but I'm really
skeptical of it in the view server.

The view server should do pure functional transforms, allowing it to do I/O
means that is no longer true. It's also not just as simple as stamping the
protocol with a uuid because erlang still needs to load balance any number
of external processes. When the view server no longer solely blocks on
processing it becomes much harder to achieve that load balancing.

> Apologies for missing this thread earlier. Better late than never I guess.
> Paul Davis

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message