couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nils Adermann <>
Subject Re: Lazy Fulltext Search
Date Wed, 16 Apr 2008 23:15:25 GMT

Jan Lehnardt wrote:
> Heya Søren,
> On Apr 15, 2008, at 15:27, Soren Hilmer wrote:
>> I guess what all this boils down to is that:
>> When a database changes, you need to re-index all the views in the
>> fulltextsearch design document.
> if you take this route. yes.
>> There are no way incremental changes can be made to the index as one 
>> document
>> change may potentially change more view results within the same view.
>> Right?
> Yup.
> Eventually, I think, we will be able to have CouchDB calculate the 
> intersection of all FT hits and a view index for you. So the FT 
> indexer will only need to index the whole DB and CouchDB filters out 
> all matching documents that are not in the requested view for you. For 
> now, you've got to do it yourself.
That's not even possible because a view (written in JS) could return 
data not directly in a document. Either combining information from 
multiple documents or generating new content based on some document 
values. You would never be able to search such content.

>> On Tuesday 15 April 2008 14:05:38 Jan Lehnardt wrote:
>>> On Apr 15, 2008, at 02:01, Nils Adermann wrote:
>>>> Hi,
>>>> I agree with Søren that this is not necessarily a good idea. It is
>>>> not trivial for an indexer to figure out which view results changed.
>>>> One method to so is storing all indexed view results and then
>>>> comparing them to the updated view once the indexer is called. This
>>>> is a needless waste of resources. Updating the view index based on
>>>> changed documents is even more difficult. You would have to
>>>> recompute the view at least partially to find out which view results
>>>> changed. Given the reduce step this means that any number of
>>>> documents, including unchanged ones could be involved. This creates
>>>> a lot of work.
>>> Yeah, but it doesn't actually matter who does the work :) So we rather
>>> keep that out of CouchDB.
Err I wasn't saying the question is where it takes place. I was saying 
you have to do the work twice instead of just once if we follow your way.

>>>> I think the problem we face here is different usage patterns of
>>>> views. There are views which process a lot of data and which are
>>>> based on documents that are updated frequently.  But they might only
>>>> be read from infrequently. These views profit from JIT computation.
>>>> However many applications use views which are infrequently updated
>>>> but often queried or searched. Such views benefit from live
>>>> updating. If an application allows searching data it nearly always
>>>> means that the data will be read more frequently than it is updated.
>>>> So in conclusion both methods (JIT and live updates) make sense for
>>>> views. But search normally only needs the live update mechanism. I
>>>> believe it should become configurable whether a view is updated
>>>> immediately after a change or only after a query takes place.
>>>> Fulltext search would always work on views with immediate updates.
>>>> The indexer would be notified about the changed results. On views
>>>> which delay updates, search would only work if the fulltext search
>>>> provides a mechanism to compare the new view results to the old ones.
>>> Just query the view with ?count=0 to trigger an update after your
>>> inserts and you have the synchronous update behaviour.
If we really do things your way that'd mean the entire database and all 
searchable views need to be reindexed completely after every single 
update. You're creating a huge amount of useless work for the indexer.


View raw message