incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Proposal for changes in view server/protocol
Date Mon, 02 Aug 2010 21:54:34 GMT
On Mon, Aug 2, 2010 at 5:34 PM, Mikeal Rogers <mikeal.rogers@gmail.com> wrote:
>>
>> For the first point about CommonJS modules in Map/Reduce views I'd say
>> the goal is fine, but I don't understand how or why you'd want that
>> hash to happen in JavaScript. Unless I'm mistaken, aren't the import
>> statements executable JS? As in, is there any requirement that you
>> couldn't import a module inside your map function? In which case, JS
>> can't really hash all imported modules until after all possible code
>> paths have been traced?
>>
>> I think a better answer would be to allow commonjs modules, but only
>> in some name-space of the design document. (IIRC, the other functions
>> can pull from anywhere, but that would make all design doc updates
>> trigger view regeneration) Then Erlang just loads this namespace and
>> anything that could be imported is included in the hash some how (hash
>> of sorted hashes or some such).
>>
>
> This is an interesting idea and I think I like it more than my original
> proposal. My fear with the original proposal was that it might be opaque to
> most users what will invalidate their views if we start doing fancy
> invalidation on modules they use. If we re-scope or restrict the module
> support to an attribute that would make it very clear that changes to those
> modules will invalidate the view.
>
>
>>
>> Batching docs across the I/O might not give you as much of a
>> performance improvement as you'd think. There's a pretty nasty time
>> explosion on parsing larger JSON documents in some of the various
>> parsers I've tried. I've noticed this on various Pure erlang parsers,
>> but I wouldn't be suprised if the the json.js suffered as well. And in
>> this, I mean, that parsing a one megabyte document might be quite a
>> bit slower than parsing many smaller documents. So simply wrapping
>> things in an array could be bad.
>>
>
> The new native C parser in JavaScript is fine with anything this size and I
> believe Damien just wrote an evented JSON parser which should make this more
> acceptable on the client side. One good idea I think jchris has was instead
> of having a number of documents threshold was to have a byte length
> restriction on the batch we send to the view server.

Yeah, the new embedded JSON parser should be fine as long as we can
motivate people to upgrade to a recent JavaScript library. My
experience is more related to the Erlang side as that what I've done
all of my comparisons against. I haven't done any testing on the
streaming parser but it'd be interesting to see how it behaves in
relation to doc size input.

> The I/O time for large amounts of small documents is larger than you would
> expect. I ran some tests a while back and there was more time spent in stdio
> for simple map/reduce operations than there was in processing on the view
> server.

Did you run the experiment to try batching the updates across the
wire? I'm not surprised that the transfer can take longer than the
computation, but I'm not sure how much benefit you'd get from batching
100 or so docs. I reckon there'd be some, I just don't have any idea
how much.

> Of course the most time spent on view generation is still writing to the
> btree but that performance has already increased quite a bit so we're
> looking for other places we can optimize.
>
>
>>
>> An alternative that I haven't seen anywhere else in this thread was an
>> idea to tag every message passed to the view engine with a uuid. Then
>> people can do all sorts of fancy things with the view engine like
>> async processing and so on and such forth. The downside being that the
>> saturday afternoon implementation of the view engine in language X now
>> takes both saturday and sunday afternoon.
>>
>
> So, this gets dicey really fast. I want the external process protocol to go
> non-blocking and support this uuid style communication but I'm really
> skeptical of it in the view server.
>
> The view server should do pure functional transforms, allowing it to do I/O
> means that is no longer true. It's also not just as simple as stamping the
> protocol with a uuid because erlang still needs to load balance any number
> of external processes. When the view server no longer solely blocks on
> processing it becomes much harder to achieve that load balancing.
>

Well, the original proposal was that if we do an asynchronous message
passing thing between with the view server then Erlang doesn't do the
load balancing, the view server could become threaded or be use a
pre-fork server model and do the load balancing across multiple cores
itself.

But you reminded me of the point that convinced me to not experiment
with the approach. If something causes the view engine to crash you
can end up affecting a lot of things that are unrelated. Ie, someone
gets a 500 on a _show function because a different app had a bug in
its view handling code that happened to be reindexing. With the
current model the effects of errors are more isolated.

>
>>
>> Apologies for missing this thread earlier. Better late than never I guess.
>>
>> Paul Davis
>>
>

Mime
View raw message