Return-Path: Delivered-To: apmail-incubator-couchdb-dev-archive@locus.apache.org Received: (qmail 17356 invoked from network); 15 Apr 2008 13:29:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Apr 2008 13:29:13 -0000 Received: (qmail 69008 invoked by uid 500); 15 Apr 2008 13:29:14 -0000 Delivered-To: apmail-incubator-couchdb-dev-archive@incubator.apache.org Received: (qmail 68982 invoked by uid 500); 15 Apr 2008 13:29:14 -0000 Mailing-List: contact couchdb-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-dev@incubator.apache.org Delivered-To: mailing list couchdb-dev@incubator.apache.org Received: (qmail 68973 invoked by uid 99); 15 Apr 2008 13:29:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2008 06:29:14 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [195.41.46.235] (HELO pfepa.post.tele.dk) (195.41.46.235) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2008 13:28:28 +0000 Received: from pascal.widetrail.dk (0x503ed345.arcnxx11.adsl-dhcp.tele.dk [80.62.211.69]) by pfepa.post.tele.dk (Postfix) with ESMTP id 3F8CEFAC11D for ; Tue, 15 Apr 2008 15:27:53 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by pascal.widetrail.dk (Postfix) with ESMTP id A4FDE38BAD for ; Tue, 15 Apr 2008 15:33:55 +0200 (CEST) Received: from pascal.widetrail.dk ([127.0.0.1]) by localhost (pascal.widetrail.dk [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 05413-06 for ; Tue, 15 Apr 2008 15:33:49 +0200 (CEST) Received: from euler.local (unknown [10.10.1.201]) by pascal.widetrail.dk (Postfix) with ESMTP id F0FBD9A3A for ; Tue, 15 Apr 2008 15:33:48 +0200 (CEST) From: Soren Hilmer Organization: wideTrail To: couchdb-dev@incubator.apache.org Subject: Re: Lazy Fulltext Search Date: Tue, 15 Apr 2008 15:27:44 +0200 User-Agent: KMail/1.9.6 (enterprise 0.20070907.709405) References: <4803F04D.4040209@naderman.de> <8D9B4B3B-0F5B-49F3-9853-2AC6D2E88030@apache.org> In-Reply-To: <8D9B4B3B-0F5B-49F3-9853-2AC6D2E88030@apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200804151527.44308.sh@widetrail.dk> X-Virus-Scanned: amavisd-new at widetrail.dk X-Virus-Checked: Checked by ClamAV on apache.org I guess what all this boils down to is that: When a database changes, you need to re-index all the views in the=20 fulltextsearch design document.=20 There are no way incremental changes can be made to the index as one docume= nt=20 change may potentially change more view results within the same view. Right? =2D-S=F8ren On Tuesday 15 April 2008 14:05:38 Jan Lehnardt wrote: > On Apr 15, 2008, at 02:01, Nils Adermann wrote: > > Hi, > > > > I agree with S=F8ren that this is not necessarily a good idea. It is > > not trivial for an indexer to figure out which view results changed. > > One method to so is storing all indexed view results and then > > comparing them to the updated view once the indexer is called. This > > is a needless waste of resources. Updating the view index based on > > changed documents is even more difficult. You would have to > > recompute the view at least partially to find out which view results > > changed. Given the reduce step this means that any number of > > documents, including unchanged ones could be involved. This creates > > a lot of work. > > Yeah, but it doesn't actually matter who does the work :) So we rather > keep that out of CouchDB. > > > I think the problem we face here is different usage patterns of > > views. There are views which process a lot of data and which are > > based on documents that are updated frequently. But they might only > > be read from infrequently. These views profit from JIT computation. > > However many applications use views which are infrequently updated > > but often queried or searched. Such views benefit from live > > updating. If an application allows searching data it nearly always > > means that the data will be read more frequently than it is updated. > > So in conclusion both methods (JIT and live updates) make sense for > > views. But search normally only needs the live update mechanism. I > > believe it should become configurable whether a view is updated > > immediately after a change or only after a query takes place. > > Fulltext search would always work on views with immediate updates. > > The indexer would be notified about the changed results. On views > > which delay updates, search would only work if the fulltext search > > provides a mechanism to compare the new view results to the old ones. > > Just query the view with ?count=3D0 to trigger an update after your > inserts and you have the synchronous update behaviour. > > > Cheers > > Nils > > > > Jan Lehnardt wrote: > >> On Apr 12, 2008, at 12:06, S=F8ren Hilmer wrote: > >>> Hi > >>> > >>> Have you read Chris' response about letting the view engine call > >>> the indexer, > >>> as it has the information needed for the indexer? As I understand > >>> the idea, > >>> it will essentially keep the fulltext indexer and the views in sync. > >>> > >>> I like this idea and I believe the code for the indexer would be > >>> much simpler > >>> and efficient. > >>> > >>> Also as the shift goes towards indexing views and not documents, > >>> it makes > >>> sense that it is the View engine that triggers the indexer, right? > >> > >> The only problem here is that views are changed, when they are > >> being queried and not when documents are added. So you could end up > >> with a lot of not-indexed data because your view hasn't been > >> queried. That can be worked around, but I don't think it makes > >> things any easier :) > >> > >> The design of the update notification is intentionally simple. We > >> expect the clients (the Indexer in this case) to be smart. We > >> believe that this makes the server code is more robust in that way. > >> > >>> I have to study the View engine, if I am to provide any code for > >>> this, though > >>> (provided consensus blows in this direction). > >>> > >>> Have fun > >>> S=F8ren > >>> > >>> On Friday 11 April 2008 13:26, Jan Lehnardt wrote: > >>>> On Apr 11, 2008, at 08:55, S=F8ren Hilmer wrote: > >>>>> Hi Jan > >>>>> > >>>>> It certainly would simplify configuration, allthough the > >>>>> DbUpdateNotificationProcess setting ought to be retained as it is > >>>>> potentially usefull for other stuff than indexing (can you have > >>>>> more > >>>>> than > >>>>> one of these, setup?) > >>>> > >>>> No, the update searcher will stay! :-) > >>>> > >>>>> I am also worried about responsetimes for searching, potentially > >>>>> the > >>>>> indexing can take considerable time. With the current approach > >>>>> indexing > >>>>> can be done off peak hours and only searching is done at prime > >>>>> time. > >>>> > >>>> Right, if you want to be conservative with resources, you might > >>>> want > >>>> togo > >>>> with my approach at the expense of possibly higher response times > >>>> the > >>>> first time things are searched for (as it is with views). I just > >>>> wanted to make > >>>> available my idea that fulltext indexing could be modelled after > >>>> how > >>>> views > >>>> work, in case this is useful for a specific scenario. > >>>> > >>>> Cheers > >>>> Jan > >>>> -- > >>>> > >>>>> Have fun > >>>>> S=F8ren > >>>>> -- > >>>>> S=F8ren Hilmer, M.Sc., M.Crypt. > >>>>> wideTrail Phone: +45 25481225 > >>>>> Pilev=E6nget 41 Email: sh@widetrail.dk > >>>>> DK-8961 Alling=E5bro Web: www.widetrail.dk > >>>>> > >>>>> On Thu, April 10, 2008 23:32, Jan Lehnardt wrote: > >>>>>> Heya, > >>>>>> while thinking more about the fulltext implementation, I began to > >>>>>> wonder why we don't model it after the view engine. > >>>>>> > >>>>>> At the moment, we have an Indexer waiting for update > >>>>>> notifications > >>>>>> and > >>>>>> polling CouchDB for changes and a separate mechanism to > >>>>>> register a > >>>>>> fulltext query Searcher, that looks up things in the index. > >>>>>> > >>>>>> My proposed architectural change would be to trigger the > >>>>>> Indexer from > >>>>>> the Searcher module when a request comes in, just like views > >>>>>> work. > >>>>>> This would delay the creation of fulltext indexes until they are > >>>>>> actually needed. > >>>>>> > >>>>>> The possible drawback though is, that when building the fulltext > >>>>>> index > >>>>>> is rather slow, old-style pre-calculation might be more feasible. > >>>>>> View > >>>>>> deal with that by requiring frequent requests (possibly cron-ed). > >>>>>> > >>>>>> This is not a proposal or anything, just a thought I wanted to > >>>>>> share > >>>>>> with those who work on fulltext integration. > >>>>>> > >>>>>> If you have any input on this, please let us know ;) > >>>>>> > >>>>>> Cheers > >>>>>> Jan > >>>>>> -- > >>> > >>> -- > >>> S=F8ren Hilmer, M.Sc., M.Crypt. > >>> wideTrail Phone: +45 25481225 > >>> Pilev=E6nget 41 Email: sh@widetrail.dk > >>> DK-8961 Alling=E5bro Web: www.widetrail.dk =2D-=20 S=F8ren Hilmer, M.Sc., M.Crypt. wideTrail=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0Phone:=A0=A0+45 25481225 Pilev=E6nget 41=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0Email:=A0=A0sh@widetrail.dk DK-8961 =A0Alling=E5bro=A0=A0=A0=A0=A0Web:=A0=A0=A0=A0www.widetrail.dk