Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 7222 invoked from network); 2 Aug 2010 19:49:00 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Aug 2010 19:49:00 -0000 Received: (qmail 24491 invoked by uid 500); 2 Aug 2010 19:49:00 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 24388 invoked by uid 500); 2 Aug 2010 19:48:59 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 24380 invoked by uid 99); 2 Aug 2010 19:48:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Aug 2010 19:48:59 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paul.joseph.davis@gmail.com designates 74.125.83.52 as permitted sender) Received: from [74.125.83.52] (HELO mail-gw0-f52.google.com) (74.125.83.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Aug 2010 19:48:54 +0000 Received: by gwj20 with SMTP id 20so1948936gwj.11 for ; Mon, 02 Aug 2010 12:48:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=tupMmrlX6Jq9W6MscFfpotePNuvV1afCRtWBMiyrE4I=; b=xEEXkQCpQaL8DoVRY0b/yH7BdPgIyYPNc/aJZIZpmGMxzVTlUvzZ3KY6cRlwy/w7d5 gLJ6GvjC4GSuW0eZy8ygMdS4cSzR+4mgY+JsdNJX7SHc97F2490RaATQC6m2hbsY2vYB hkOsqn/8YdcZhCXAP94FxtlDVuE1uqvwHNdyQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=myT42Bc2KUjTiNmWtkdbwST6EoVbcFsQKsXodBZrcetxxyjFpivRQXDE6qjNydxjM0 KLvYj2qENcjZbe702d1BOsw4Lhafhoe9hZwCrgFxgGAvCKVRJQs07OtPSW+AM7j48Slc 8mD737UOhXZPiAEUDQQ3OsdqAd/p55SNQlFSU= Received: by 10.90.94.9 with SMTP id r9mr5022067agb.178.1280778510403; Mon, 02 Aug 2010 12:48:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.31.3 with HTTP; Mon, 2 Aug 2010 12:48:09 -0700 (PDT) In-Reply-To: References: From: Paul Davis Date: Mon, 2 Aug 2010 15:48:09 -0400 Message-ID: Subject: Re: Proposal for changes in view server/protocol To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 On Mon, Jul 26, 2010 at 5:35 PM, Mikeal Rogers wrote: > After some conversations I've had in NYC this week and Mathias' great post > on the 10 biggest issues with CouchDB ( > http://www.paperplanes.de/2010/7/26/10_annoying_things_about_couchdb.html ) > I wanted to formally propose some changes to the view server/protocol. > > The first issue I want to tackle is the lack of CommonJS modules in > map/reduce. The reason for this is that we use a deterministic hash on all > the views in a design document in order to query it. > > First off, it would be great if we could separate out each view and cache it > based on it's own hash. This way updating one view doesn't blow away the > entire design document. This has some large ramification, for one thing it > means that each view needs to keep it's own last sequence and while one view > is getting up to date it can't be included in generation when other views > are getting updated. > > Once each view has it's own deterministic hash I would propose that we move > the responsibility for generating the has to a new view server call. This > call would get triggered during every design doc update and look something > like. > > request : ["hash", {"_id":"_design/foo", .......} ] > response ["views/bar","aoivniuasdf8ashd7zh87vxxz87gf8sd7"] > > The view server can inspect each map/reduce function and determine which > modules it imports and include those strings in the hash for that particular > view. > > The second issue I'd like to tackle is two fold, parallelized view > generation and unnecessarily chatty IO for large view generations. > > Currently, every single document is passed to the view server one at a time > and the response is read back one at a time. I would suggest that we allow a > user configuration upper limit to "batch" documents to the view server (100 > by default). The request/response would remain exactly the same as it is now > except there would be an extra array around the request and response. > > This would also open up the ability for the view server to break up that > batch and pass it to different view servers and then return the responses > all together (this obviously means it's limited to the speed of the client > handling that last chunk). > > Thoughts? > > Somewhere on github I actually have the changes to the view server for that > batching but it doesn't includes the changes on the erlang side. > > -Mikeal > For the first point about CommonJS modules in Map/Reduce views I'd say the goal is fine, but I don't understand how or why you'd want that hash to happen in JavaScript. Unless I'm mistaken, aren't the import statements executable JS? As in, is there any requirement that you couldn't import a module inside your map function? In which case, JS can't really hash all imported modules until after all possible code paths have been traced? I think a better answer would be to allow commonjs modules, but only in some name-space of the design document. (IIRC, the other functions can pull from anywhere, but that would make all design doc updates trigger view regeneration) Then Erlang just loads this namespace and anything that could be imported is included in the hash some how (hash of sorted hashes or some such). Batching docs across the I/O might not give you as much of a performance improvement as you'd think. There's a pretty nasty time explosion on parsing larger JSON documents in some of the various parsers I've tried. I've noticed this on various Pure erlang parsers, but I wouldn't be suprised if the the json.js suffered as well. And in this, I mean, that parsing a one megabyte document might be quite a bit slower than parsing many smaller documents. So simply wrapping things in an array could be bad. An alternative that I haven't seen anywhere else in this thread was an idea to tag every message passed to the view engine with a uuid. Then people can do all sorts of fancy things with the view engine like async processing and so on and such forth. The downside being that the saturday afternoon implementation of the view engine in language X now takes both saturday and sunday afternoon. Apologies for missing this thread earlier. Better late than never I guess. Paul Davis