incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soren Hilmer ...@widetrail.dk>
Subject Re: Lazy Fulltext Search
Date Tue, 15 Apr 2008 13:27:44 GMT
I guess what all this boils down to is that:

When a database changes, you need to re-index all the views in the 
fulltextsearch design document. 
There are no way incremental changes can be made to the index as one document 
change may potentially change more view results within the same view.

Right?

--Søren


On Tuesday 15 April 2008 14:05:38 Jan Lehnardt wrote:
> On Apr 15, 2008, at 02:01, Nils Adermann wrote:
> > Hi,
> >
> > I agree with Søren that this is not necessarily a good idea. It is
> > not trivial for an indexer to figure out which view results changed.
> > One method to so is storing all indexed view results and then
> > comparing them to the updated view once the indexer is called. This
> > is a needless waste of resources. Updating the view index based on
> > changed documents is even more difficult. You would have to
> > recompute the view at least partially to find out which view results
> > changed. Given the reduce step this means that any number of
> > documents, including unchanged ones could be involved. This creates
> > a lot of work.
>
> Yeah, but it doesn't actually matter who does the work :) So we rather
> keep that out of CouchDB.
>
> > I think the problem we face here is different usage patterns of
> > views. There are views which process a lot of data and which are
> > based on documents that are updated frequently.  But they might only
> > be read from infrequently. These views profit from JIT computation.
> > However many applications use views which are infrequently updated
> > but often queried or searched. Such views benefit from live
> > updating. If an application allows searching data it nearly always
> > means that the data will be read more frequently than it is updated.
> > So in conclusion both methods (JIT and live updates) make sense for
> > views. But search normally only needs the live update mechanism. I
> > believe it should become configurable whether a view is updated
> > immediately after a change or only after a query takes place.
> > Fulltext search would always work on views with immediate updates.
> > The indexer would be notified about the changed results. On views
> > which delay updates, search would only work if the fulltext search
> > provides a mechanism to compare the new view results to the old ones.
>
> Just query the view with ?count=0 to trigger an update after your
> inserts and you have the synchronous update behaviour.
>
> > Cheers
> > Nils
> >
> > Jan Lehnardt wrote:
> >> On Apr 12, 2008, at 12:06, Søren Hilmer wrote:
> >>> Hi
> >>>
> >>> Have you read Chris' response about letting the view engine call
> >>> the indexer,
> >>> as it has the information needed for the indexer? As I understand
> >>> the idea,
> >>> it will essentially keep the fulltext indexer and the views in sync.
> >>>
> >>> I like this idea and I believe the code for the indexer would be
> >>> much simpler
> >>> and efficient.
> >>>
> >>> Also as the shift goes towards indexing views and not documents,
> >>> it makes
> >>> sense that it is the View engine that triggers the indexer, right?
> >>
> >> The only problem here is that views are changed, when they are
> >> being queried and not when documents are added. So you could end up
> >> with a lot of not-indexed data because your view hasn't been
> >> queried. That can be worked around, but I don't think it makes
> >> things any easier :)
> >>
> >> The design of the update notification is intentionally simple. We
> >> expect the clients (the Indexer in this case) to be smart. We
> >> believe that this makes the server code is more robust in that way.
> >>
> >>> I have to study the View engine, if I am to provide any code for
> >>> this, though
> >>> (provided consensus blows in this direction).
> >>>
> >>> Have fun
> >>>  Søren
> >>>
> >>> On Friday 11 April 2008 13:26, Jan Lehnardt wrote:
> >>>> On Apr 11, 2008, at 08:55, Søren Hilmer wrote:
> >>>>> Hi Jan
> >>>>>
> >>>>> It certainly would simplify configuration, allthough the
> >>>>> DbUpdateNotificationProcess setting ought to be retained as it is
> >>>>> potentially usefull for other stuff than indexing (can you have
> >>>>> more
> >>>>> than
> >>>>> one of these, setup?)
> >>>>
> >>>> No, the update searcher will stay! :-)
> >>>>
> >>>>> I am also worried about responsetimes for searching, potentially
> >>>>> the
> >>>>> indexing can take considerable time. With the current approach
> >>>>> indexing
> >>>>> can be done off peak hours and only searching is done at prime
> >>>>> time.
> >>>>
> >>>> Right, if you want to be conservative with resources, you might
> >>>> want
> >>>> togo
> >>>> with my approach at the expense of possibly higher response times
> >>>> the
> >>>> first time things are searched for (as it is with views). I just
> >>>> wanted to make
> >>>> available my idea that fulltext indexing could be modelled after
> >>>> how
> >>>> views
> >>>> work, in case this is useful for a specific scenario.
> >>>>
> >>>> Cheers
> >>>> Jan
> >>>> --
> >>>>
> >>>>> Have fun
> >>>>> Søren
> >>>>> --
> >>>>> Søren Hilmer, M.Sc., M.Crypt.
> >>>>> wideTrail            Phone: +45 25481225
> >>>>> Pilevænget 41        Email: sh@widetrail.dk
> >>>>> DK-8961  Allingåbro  Web: www.widetrail.dk
> >>>>>
> >>>>> On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
> >>>>>> Heya,
> >>>>>> while thinking more about the fulltext implementation, I began
to
> >>>>>> wonder why we don't model it after the view engine.
> >>>>>>
> >>>>>> At the moment, we have an Indexer waiting for update
> >>>>>> notifications
> >>>>>> and
> >>>>>> polling CouchDB for changes and a separate mechanism to
> >>>>>> register a
> >>>>>> fulltext query Searcher, that looks up things in the index.
> >>>>>>
> >>>>>> My proposed architectural change would be to trigger the
> >>>>>> Indexer from
> >>>>>> the Searcher module when a request comes in, just like views
> >>>>>> work.
> >>>>>> This would delay the creation of fulltext indexes until they
are
> >>>>>> actually needed.
> >>>>>>
> >>>>>> The possible drawback though is, that when building the fulltext
> >>>>>> index
> >>>>>> is rather slow, old-style pre-calculation might be more feasible.
> >>>>>> View
> >>>>>> deal with that by requiring frequent requests (possibly cron-ed).
> >>>>>>
> >>>>>> This is not a proposal or anything, just a thought I wanted
to
> >>>>>> share
> >>>>>> with those who work on fulltext integration.
> >>>>>>
> >>>>>> If you have any input on this, please let us know ;)
> >>>>>>
> >>>>>> Cheers
> >>>>>> Jan
> >>>>>> --
> >>>
> >>> --
> >>> Søren Hilmer, M.Sc., M.Crypt.
> >>> wideTrail            Phone:    +45 25481225
> >>> Pilevænget 41        Email:    sh@widetrail.dk
> >>> DK-8961  Allingåbro    Web:    www.widetrail.dk



-- 
Søren Hilmer, M.Sc., M.Crypt.
wideTrail                       Phone:  +45 25481225
Pilevænget 41           Email:  sh@widetrail.dk
DK-8961  Allingåbro     Web:    www.widetrail.dk

Mime
View raw message