Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 8638 invoked from network); 2 Aug 2010 19:54:17 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Aug 2010 19:54:17 -0000 Received: (qmail 32519 invoked by uid 500); 2 Aug 2010 19:54:17 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 32336 invoked by uid 500); 2 Aug 2010 19:54:17 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 32326 invoked by uid 99); 2 Aug 2010 19:54:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Aug 2010 19:54:17 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.213.52 as permitted sender) Received: from [209.85.213.52] (HELO mail-yw0-f52.google.com) (209.85.213.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Aug 2010 19:54:10 +0000 Received: by ywa6 with SMTP id 6so1910396ywa.11 for ; Mon, 02 Aug 2010 12:53:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=iJadPeSl5mottsnW8XTS+PSRrdO+HBKZ4wP8BFLNWRs=; b=Ofy5xo2nlxh//bUdyjvtOZ/sf3Rm3CimU9b2zcPlnyJlusoj+BFQQH0N/yrRPFJl0N Q2LfkH/cHd8WH0yZnn7vb7fTJ1yYs/gY9ltNm5i+GtlkUusUFuwFmJtJDzbEkqnWUtca oOKHucwakO7wcfsJzckifAxXn8XyfuGk33vs0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=XMMVgIbY0ycJtX3rzSh9ebMHFUgcuQqyIj5MfKYHzgdmzixkIDNOSInNCozJXr2tSR sVospyng81P9wZi5PGT47p1ppG9YqMq1LNfVuz7Ope+RBWJJ+VJDbnsa9GjOXe2c9KE4 7ovSQrpGwVKfgzS/d8+njv0zeIpR7B3ksDydQ= Received: by 10.150.93.13 with SMTP id q13mr7527617ybb.339.1280778829291; Mon, 02 Aug 2010 12:53:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.31.3 with HTTP; Mon, 2 Aug 2010 12:53:28 -0700 (PDT) In-Reply-To: References: From: Paul Davis Date: Mon, 2 Aug 2010 15:53:28 -0400 Message-ID: Subject: Re: Proposal for changes in view server/protocol To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Jul 26, 2010 at 5:35 PM, Mikeal Rogers wrote: > After some conversations I've had in NYC this week and Mathias' great post > on the 10 biggest issues with CouchDB ( > http://www.paperplanes.de/2010/7/26/10_annoying_things_about_couchdb.html ) > I wanted to formally propose some changes to the view server/protocol. > > The first issue I want to tackle is the lack of CommonJS modules in > map/reduce. The reason for this is that we use a deterministic hash on all > the views in a design document in order to query it. > > First off, it would be great if we could separate out each view and cache it > based on it's own hash. This way updating one view doesn't blow away the > entire design document. This has some large ramification, for one thing it > means that each view needs to keep it's own last sequence and while one view > is getting up to date it can't be included in generation when other views > are getting updated. > > Once each view has it's own deterministic hash I would propose that we move > the responsibility for generating the has to a new view server call. This > call would get triggered during every design doc update and look something > like. > > request : ["hash", {"_id":"_design/foo", .......} ] > response ["views/bar","aoivniuasdf8ashd7zh87vxxz87gf8sd7"] > > The view server can inspect each map/reduce function and determine which > modules it imports and include those strings in the hash for that particular > view. > > The second issue I'd like to tackle is two fold, parallelized view > generation and unnecessarily chatty IO for large view generations. > > Currently, every single document is passed to the view server one at a time > and the response is read back one at a time. I would suggest that we allow a > user configuration upper limit to "batch" documents to the view server (100 > by default). The request/response would remain exactly the same as it is now > except there would be an extra array around the request and response. > > This would also open up the ability for the view server to break up that > batch and pass it to different view servers and then return the responses > all together (this obviously means it's limited to the speed of the client > handling that last chunk). > > Thoughts? > > Somewhere on github I actually have the changes to the view server for that > batching but it doesn't includes the changes on the erlang side. > > -Mikeal > I forgot to comment on the splitting indices comment. I'd generally agree with jchris that its probably not a good idea. I'd worry about the resource limits for things like number of files and extra overhead because the by_seq_id couldn't be shared. There's also the behaviour that views are updated together. Theoretically you could still update them that way, but then you lose the benefits of having all writes going to the same place on the platter. If you're updating 10 files in parallel, that's 10 seeks per write which is less good. The real fish to fry here is the answer to this question: Why do our derived views require the same amount of durability guarantees as the main database? As long as we can detect corruption and provide snapshot reads, perhaps investigating trading off durability for speed might be a win. Also, with fewer constraints on durability, we might be able to separate out view invalidation to single views which solves the issues above. Paul Davis