Return-Path: Delivered-To: apmail-incubator-couchdb-dev-archive@locus.apache.org Received: (qmail 23315 invoked from network); 15 Apr 2008 13:39:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Apr 2008 13:39:23 -0000 Received: (qmail 84622 invoked by uid 500); 15 Apr 2008 13:39:15 -0000 Delivered-To: apmail-incubator-couchdb-dev-archive@incubator.apache.org Received: (qmail 84574 invoked by uid 500); 15 Apr 2008 13:39:15 -0000 Mailing-List: contact couchdb-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-dev@incubator.apache.org Delivered-To: mailing list couchdb-dev@incubator.apache.org Received: (qmail 83981 invoked by uid 99); 15 Apr 2008 13:39:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2008 06:39:14 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jan@prima.de designates 83.97.50.139 as permitted sender) Received: from [83.97.50.139] (HELO jan.prima.de) (83.97.50.139) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2008 13:38:19 +0000 Received: from [192.168.1.37] (e179069038.adsl.alicedsl.de [::ffff:85.179.69.38]) (AUTH: LOGIN jan, SSL: TLSv1/SSLv3,128bits,AES128-SHA) by jan.prima.de with esmtp; Tue, 15 Apr 2008 13:38:38 +0000 Message-Id: <6046F6B0-F01F-4FA5-B414-E04847E3F76C@prima.de> From: Jan Lehnardt To: couchdb-dev@incubator.apache.org In-Reply-To: <200804151527.44308.sh@widetrail.dk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v919.2) Subject: Re: Lazy Fulltext Search Date: Tue, 15 Apr 2008 15:38:07 +0200 References: <4803F04D.4040209@naderman.de> <8D9B4B3B-0F5B-49F3-9853-2AC6D2E88030@apache.org> <200804151527.44308.sh@widetrail.dk> X-Mailer: Apple Mail (2.919.2) X-Virus-Checked: Checked by ClamAV on apache.org Heya S=F8ren, On Apr 15, 2008, at 15:27, Soren Hilmer wrote: > I guess what all this boils down to is that: > > When a database changes, you need to re-index all the views in the > fulltextsearch design document. if you take this route. yes. > There are no way incremental changes can be made to the index as one =20= > document > change may potentially change more view results within the same view. > Right? Yup. Eventually, I think, we will be able to have CouchDB calculate the =20 intersection of all FT hits and a view index for you. So the FT =20 indexer will only need to index the whole DB and CouchDB filters out =20 all matching documents that are not in the requested view for you. For =20= now, you've got to do it yourself. Cheers Jan -- > > > --S=F8ren > > > On Tuesday 15 April 2008 14:05:38 Jan Lehnardt wrote: >> On Apr 15, 2008, at 02:01, Nils Adermann wrote: >>> Hi, >>> >>> I agree with S=F8ren that this is not necessarily a good idea. It is >>> not trivial for an indexer to figure out which view results changed. >>> One method to so is storing all indexed view results and then >>> comparing them to the updated view once the indexer is called. This >>> is a needless waste of resources. Updating the view index based on >>> changed documents is even more difficult. You would have to >>> recompute the view at least partially to find out which view results >>> changed. Given the reduce step this means that any number of >>> documents, including unchanged ones could be involved. This creates >>> a lot of work. >> >> Yeah, but it doesn't actually matter who does the work :) So we =20 >> rather >> keep that out of CouchDB. >> >>> I think the problem we face here is different usage patterns of >>> views. There are views which process a lot of data and which are >>> based on documents that are updated frequently. But they might only >>> be read from infrequently. These views profit from JIT computation. >>> However many applications use views which are infrequently updated >>> but often queried or searched. Such views benefit from live >>> updating. If an application allows searching data it nearly always >>> means that the data will be read more frequently than it is updated. >>> So in conclusion both methods (JIT and live updates) make sense for >>> views. But search normally only needs the live update mechanism. I >>> believe it should become configurable whether a view is updated >>> immediately after a change or only after a query takes place. >>> Fulltext search would always work on views with immediate updates. >>> The indexer would be notified about the changed results. On views >>> which delay updates, search would only work if the fulltext search >>> provides a mechanism to compare the new view results to the old =20 >>> ones. >> >> Just query the view with ?count=3D0 to trigger an update after your >> inserts and you have the synchronous update behaviour. >> >>> Cheers >>> Nils >>> >>> Jan Lehnardt wrote: >>>> On Apr 12, 2008, at 12:06, S=F8ren Hilmer wrote: >>>>> Hi >>>>> >>>>> Have you read Chris' response about letting the view engine call >>>>> the indexer, >>>>> as it has the information needed for the indexer? As I understand >>>>> the idea, >>>>> it will essentially keep the fulltext indexer and the views in =20 >>>>> sync. >>>>> >>>>> I like this idea and I believe the code for the indexer would be >>>>> much simpler >>>>> and efficient. >>>>> >>>>> Also as the shift goes towards indexing views and not documents, >>>>> it makes >>>>> sense that it is the View engine that triggers the indexer, right? >>>> >>>> The only problem here is that views are changed, when they are >>>> being queried and not when documents are added. So you could end up >>>> with a lot of not-indexed data because your view hasn't been >>>> queried. That can be worked around, but I don't think it makes >>>> things any easier :) >>>> >>>> The design of the update notification is intentionally simple. We >>>> expect the clients (the Indexer in this case) to be smart. We >>>> believe that this makes the server code is more robust in that way. >>>> >>>>> I have to study the View engine, if I am to provide any code for >>>>> this, though >>>>> (provided consensus blows in this direction). >>>>> >>>>> Have fun >>>>> S=F8ren >>>>> >>>>> On Friday 11 April 2008 13:26, Jan Lehnardt wrote: >>>>>> On Apr 11, 2008, at 08:55, S=F8ren Hilmer wrote: >>>>>>> Hi Jan >>>>>>> >>>>>>> It certainly would simplify configuration, allthough the >>>>>>> DbUpdateNotificationProcess setting ought to be retained as it =20= >>>>>>> is >>>>>>> potentially usefull for other stuff than indexing (can you have >>>>>>> more >>>>>>> than >>>>>>> one of these, setup?) >>>>>> >>>>>> No, the update searcher will stay! :-) >>>>>> >>>>>>> I am also worried about responsetimes for searching, potentially >>>>>>> the >>>>>>> indexing can take considerable time. With the current approach >>>>>>> indexing >>>>>>> can be done off peak hours and only searching is done at prime >>>>>>> time. >>>>>> >>>>>> Right, if you want to be conservative with resources, you might >>>>>> want >>>>>> togo >>>>>> with my approach at the expense of possibly higher response times >>>>>> the >>>>>> first time things are searched for (as it is with views). I just >>>>>> wanted to make >>>>>> available my idea that fulltext indexing could be modelled after >>>>>> how >>>>>> views >>>>>> work, in case this is useful for a specific scenario. >>>>>> >>>>>> Cheers >>>>>> Jan >>>>>> -- >>>>>> >>>>>>> Have fun >>>>>>> S=F8ren >>>>>>> -- >>>>>>> S=F8ren Hilmer, M.Sc., M.Crypt. >>>>>>> wideTrail Phone: +45 25481225 >>>>>>> Pilev=E6nget 41 Email: sh@widetrail.dk >>>>>>> DK-8961 Alling=E5bro Web: www.widetrail.dk >>>>>>> >>>>>>> On Thu, April 10, 2008 23:32, Jan Lehnardt wrote: >>>>>>>> Heya, >>>>>>>> while thinking more about the fulltext implementation, I =20 >>>>>>>> began to >>>>>>>> wonder why we don't model it after the view engine. >>>>>>>> >>>>>>>> At the moment, we have an Indexer waiting for update >>>>>>>> notifications >>>>>>>> and >>>>>>>> polling CouchDB for changes and a separate mechanism to >>>>>>>> register a >>>>>>>> fulltext query Searcher, that looks up things in the index. >>>>>>>> >>>>>>>> My proposed architectural change would be to trigger the >>>>>>>> Indexer from >>>>>>>> the Searcher module when a request comes in, just like views >>>>>>>> work. >>>>>>>> This would delay the creation of fulltext indexes until they =20= >>>>>>>> are >>>>>>>> actually needed. >>>>>>>> >>>>>>>> The possible drawback though is, that when building the =20 >>>>>>>> fulltext >>>>>>>> index >>>>>>>> is rather slow, old-style pre-calculation might be more =20 >>>>>>>> feasible. >>>>>>>> View >>>>>>>> deal with that by requiring frequent requests (possibly cron-=20= >>>>>>>> ed). >>>>>>>> >>>>>>>> This is not a proposal or anything, just a thought I wanted to >>>>>>>> share >>>>>>>> with those who work on fulltext integration. >>>>>>>> >>>>>>>> If you have any input on this, please let us know ;) >>>>>>>> >>>>>>>> Cheers >>>>>>>> Jan >>>>>>>> -- >>>>> >>>>> -- >>>>> S=F8ren Hilmer, M.Sc., M.Crypt. >>>>> wideTrail Phone: +45 25481225 >>>>> Pilev=E6nget 41 Email: sh@widetrail.dk >>>>> DK-8961 Alling=E5bro Web: www.widetrail.dk > > > > --=20 > S=F8ren Hilmer, M.Sc., M.Crypt. > wideTrail Phone: +45 25481225 > Pilev=E6nget 41 Email: sh@widetrail.dk > DK-8961 Alling=E5bro Web: www.widetrail.dk >