Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 63358 invoked from network); 16 Sep 2009 03:00:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Sep 2009 03:00:15 -0000 Received: (qmail 93796 invoked by uid 500); 16 Sep 2009 03:00:14 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 93707 invoked by uid 500); 16 Sep 2009 03:00:14 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 93697 invoked by uid 99); 16 Sep 2009 03:00:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Sep 2009 03:00:14 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gurdiga@gmail.com designates 209.85.220.224 as permitted sender) Received: from [209.85.220.224] (HELO mail-fx0-f224.google.com) (209.85.220.224) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Sep 2009 03:00:06 +0000 Received: by fxm24 with SMTP id 24so3874375fxm.11 for ; Tue, 15 Sep 2009 19:59:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=YFU+1xP0Y8ix74/lCjnRxH4LLGMZiHf1lD37/cdHlSQ=; b=YVk8OCA1TlM8CV905/Ru0lBmD/ZCRZ+Zh8kgqO3li+RHK46G4Dwr5nxFviZVot61ho h7t3qjRfBQGvzGG8H9YioHOUVqo7dh7IMuDZ4L5iZkZvDiH6bceiHBSd/5QD9nP3jRQv 1qxRTM6bmPPI1zyKgh0shtROQOqGdyRD5cbzo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=WNIJdvYcNuUyeEbU/pWxo6fvIuZzxRHtIbiHF7eSCL1MkF9wtZV5G71DHj3VQ1//es quibQBTC0/nlgAWcl2MKV7Wjq9dyV2k/yE+4FLrenHRCSpJ1yYl3wqsASB5HUH9MJ1nJ XOhHPdUqKz/7+1xbVT/ebc3qJjo8wb4TKTThw= MIME-Version: 1.0 Received: by 10.102.245.35 with SMTP id s35mr3542501muh.124.1253069985414; Tue, 15 Sep 2009 19:59:45 -0700 (PDT) In-Reply-To: <1904635054.1252972857560.JavaMail.jira@brutus> References: <866154284.1252017297473.JavaMail.jira@brutus> <1904635054.1252972857560.JavaMail.jira@brutus> Date: Wed, 16 Sep 2009 05:59:45 +0300 Message-ID: Subject: Re: [jira] Closed: (COUCHDB-495) Make views twice as fast From: Vlad GURDIGA To: dev@couchdb.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Nice work guys! I don't really understand (yet) everything that you're talking about here, but the issue title sounds really great! Also very glad to hear that ICU was not really a bottleneck for collation. On Tue, Sep 15, 2009 at 3:00 AM, Damien Katz (JIRA) wrote= : > > =C2=A0 =C2=A0 [ https://issues.apache.org/jira/browse/COUCHDB-495?page=3D= com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] > > Damien Katz closed COUCHDB-495. > ------------------------------- > > =C2=A0 =C2=A0Resolution: Fixed > > We now have a raw collation option, and regular json collation is much fa= ster too. > >> Make views twice as fast >> ------------------------ >> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Key: COUCHDB-495 >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 URL: https://iss= ues.apache.org/jira/browse/COUCHDB-495 >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Project: CouchDB >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Issue Type: Improvement >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Components: JavaScript View Server >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Reporter: Chris Anderson >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Fix For: 0.11 >> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 Attachments: binary_collate.diff, couch_perf= .py, less_json.patch, numbers-davisp.txt, outputv.patch, perf.py, R13B1-uca= -bif.patch, term_collate.diff >> >> >> Devs, >> Damien's identified view collation as the most significant bottleneck fo= r the view generation. We've done some testing, and some preliminary patche= s, and the upshot seems to be that even removing ICU from the collator is n= ot a significant boost. What does speed things up greatly is using raw Erla= ng term comparison. Eg, instead of using couch_view:less_json, using fun(A,= B) A < B end. provides a roughly 2x speedup. >> However, the patch is challenging for a few reasons: Making the collatio= n strategy switchable at all is tough. It's actually quite easy to get an a= lternate less function into the btree writer (all you've got to do is set i= t in couch_view_group:init_group). The hard part is propagating the same le= ss function to the PassedEndFun. There's a secondary problem that when you = use raw term comparison, a lot of terms turn out to come before nil, and af= ter {}, which we use as artificial first and last terms in the less_json fu= nction. So just switching to raw collation alone will leave you with a view= with unreachable rows. >> I tried two different approaches to the problem last night, and both of = them led to (instructive) dead ends. I'll attach them for illustration purp= oses. >> The next line of attack we think should be tried is this: >> First - remove _all_docs_by_seq, as it is just adding complexity to the = problem, and has been deprecated by _changes anyway. Along the same lines, = _all_docs should no longer use couch_httpd_view:make_view_fold_fun as it ha= s completely different collation needs than make_view_fold_fun. We'll end u= p duplicating a little code in the _all_docs implementation, but it should = be worth it because it will make the other work much simpler. >> Once those changes have laid the groundwork, the next step is to change = make_view_fold_fun and couch_view:fold, so that rather than make_view_fold_= fun being responsible for detecting when we've passed the endkey. That mean= s make_passed_end_fun and all references to PassedEnd and PassedEnd fun wil= l be stripped from couch_httpd_view and moved to couch_btree. >> couch_view:fold (and the underlying btree) will need to accept not just = a start, but also an endkey. This will make it much easier to use the less = fun that is stored on View#view.btree#btree.less to determine PassedEnd fun= s. This will move some complexity to the btree code from the view code, but= will keep the concerns more aligned. This also means that the btree will n= eed to accept not only an endkey for folds, but also an inclusive_end param= eter. >> Once we have all these refactorings done, it will be easy to make the le= ss fun for an index configurable, as both the index writer and the index re= ader will look for it in the same place (on the #btree record). >> My aim is to start a discussion and get someone excited to work on this = patch. Think of all the fast-views glory you'll get! Please ask questions a= nd otherwise force me to clarify the above discussion. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >