Return-Path: Delivered-To: apmail-incubator-couchdb-dev-archive@locus.apache.org Received: (qmail 80088 invoked from network); 13 Apr 2008 21:01:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Apr 2008 21:01:13 -0000 Received: (qmail 59530 invoked by uid 500); 13 Apr 2008 21:01:13 -0000 Delivered-To: apmail-incubator-couchdb-dev-archive@incubator.apache.org Received: (qmail 59485 invoked by uid 500); 13 Apr 2008 21:01:13 -0000 Mailing-List: contact couchdb-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-dev@incubator.apache.org Delivered-To: mailing list couchdb-dev@incubator.apache.org Received: (qmail 59476 invoked by uid 99); 13 Apr 2008 21:01:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Apr 2008 14:01:13 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [83.97.50.139] (HELO jan.prima.de) (83.97.50.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Apr 2008 21:00:28 +0000 Received: from [192.168.1.33] (e179076081.adsl.alicedsl.de [::ffff:85.179.76.81]) (AUTH: LOGIN jan, SSL: TLSv1/SSLv3,128bits,AES128-SHA) by jan.prima.de with esmtp; Sun, 13 Apr 2008 21:00:35 +0000 Message-Id: From: Jan Lehnardt To: couchdb-dev@incubator.apache.org In-Reply-To: <200804121206.47864.sh@widetrail.dk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v919.2) Subject: Re: Lazy Fulltext Search Date: Sun, 13 Apr 2008 23:00:00 +0200 References: <17413.193.3.142.123.1207896938.squirrel@www.widetrail.dk> <200804121206.47864.sh@widetrail.dk> X-Mailer: Apple Mail (2.919.2) X-Virus-Checked: Checked by ClamAV on apache.org On Apr 12, 2008, at 12:06, S=F8ren Hilmer wrote: > Hi > > Have you read Chris' response about letting the view engine call the =20= > indexer, > as it has the information needed for the indexer? As I understand =20 > the idea, > it will essentially keep the fulltext indexer and the views in sync. > > I like this idea and I believe the code for the indexer would be =20 > much simpler > and efficient. > > Also as the shift goes towards indexing views and not documents, it =20= > makes > sense that it is the View engine that triggers the indexer, right? The only problem here is that views are changed, when they are being =20 queried and not when documents are added. So you could end up with a =20 lot of not-indexed data because your view hasn't been queried. That =20 can be worked around, but I don't think it makes things any easier :) The design of the update notification is intentionally simple. We =20 expect the clients (the Indexer in this case) to be smart. We believe =20= that this makes the server code is more robust in that way. > I have to study the View engine, if I am to provide any code for =20 > this, though > (provided consensus blows in this direction). > > Have fun > S=F8ren > On Friday 11 April 2008 13:26, Jan Lehnardt wrote: >> On Apr 11, 2008, at 08:55, S=F8ren Hilmer wrote: >>> Hi Jan >>> >>> It certainly would simplify configuration, allthough the >>> DbUpdateNotificationProcess setting ought to be retained as it is >>> potentially usefull for other stuff than indexing (can you have more >>> than >>> one of these, setup?) >> >> No, the update searcher will stay! :-) >> >>> I am also worried about responsetimes for searching, potentially the >>> indexing can take considerable time. With the current approach >>> indexing >>> can be done off peak hours and only searching is done at prime time. >> >> Right, if you want to be conservative with resources, you might want >> togo >> with my approach at the expense of possibly higher response times the >> first time things are searched for (as it is with views). I just >> wanted to make >> available my idea that fulltext indexing could be modelled after how >> views >> work, in case this is useful for a specific scenario. >> >> Cheers >> Jan >> -- >> >>> Have fun >>> S=F8ren >>> -- >>> S=F8ren Hilmer, M.Sc., M.Crypt. >>> wideTrail Phone: +45 25481225 >>> Pilev=E6nget 41 Email: sh@widetrail.dk >>> DK-8961 Alling=E5bro Web: www.widetrail.dk >>> >>> On Thu, April 10, 2008 23:32, Jan Lehnardt wrote: >>>> Heya, >>>> while thinking more about the fulltext implementation, I began to >>>> wonder why we don't model it after the view engine. >>>> >>>> At the moment, we have an Indexer waiting for update notifications >>>> and >>>> polling CouchDB for changes and a separate mechanism to register a >>>> fulltext query Searcher, that looks up things in the index. >>>> >>>> My proposed architectural change would be to trigger the Indexer =20= >>>> from >>>> the Searcher module when a request comes in, just like views work. >>>> This would delay the creation of fulltext indexes until they are >>>> actually needed. >>>> >>>> The possible drawback though is, that when building the fulltext >>>> index >>>> is rather slow, old-style pre-calculation might be more feasible. >>>> View >>>> deal with that by requiring frequent requests (possibly cron-ed). >>>> >>>> This is not a proposal or anything, just a thought I wanted to =20 >>>> share >>>> with those who work on fulltext integration. >>>> >>>> If you have any input on this, please let us know ;) >>>> >>>> Cheers >>>> Jan >>>> -- > > --=20 > S=F8ren Hilmer, M.Sc., M.Crypt. > wideTrail Phone: +45 25481225 > Pilev=E6nget 41 Email: sh@widetrail.dk > DK-8961 Alling=E5bro Web: www.widetrail.dk >