incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nils Adermann <nader...@naderman.de>
Subject Re: Lazy Fulltext Search
Date Wed, 16 Apr 2008 23:15:25 GMT
Hi,

Jan Lehnardt wrote:
> Heya Søren,
> On Apr 15, 2008, at 15:27, Soren Hilmer wrote:
>> I guess what all this boils down to is that:
>>
>> When a database changes, you need to re-index all the views in the
>> fulltextsearch design document.
>
> if you take this route. yes.
>
>> There are no way incremental changes can be made to the index as one 
>> document
>> change may potentially change more view results within the same view.
>> Right?
>
> Yup.
>
> Eventually, I think, we will be able to have CouchDB calculate the 
> intersection of all FT hits and a view index for you. So the FT 
> indexer will only need to index the whole DB and CouchDB filters out 
> all matching documents that are not in the requested view for you. For 
> now, you've got to do it yourself.
>
That's not even possible because a view (written in JS) could return 
data not directly in a document. Either combining information from 
multiple documents or generating new content based on some document 
values. You would never be able to search such content.

>> On Tuesday 15 April 2008 14:05:38 Jan Lehnardt wrote:
>>> On Apr 15, 2008, at 02:01, Nils Adermann wrote:
>>>> Hi,
>>>>
>>>> I agree with Søren that this is not necessarily a good idea. It is
>>>> not trivial for an indexer to figure out which view results changed.
>>>> One method to so is storing all indexed view results and then
>>>> comparing them to the updated view once the indexer is called. This
>>>> is a needless waste of resources. Updating the view index based on
>>>> changed documents is even more difficult. You would have to
>>>> recompute the view at least partially to find out which view results
>>>> changed. Given the reduce step this means that any number of
>>>> documents, including unchanged ones could be involved. This creates
>>>> a lot of work.
>>>
>>> Yeah, but it doesn't actually matter who does the work :) So we rather
>>> keep that out of CouchDB.
>>>
Err I wasn't saying the question is where it takes place. I was saying 
you have to do the work twice instead of just once if we follow your way.

>>>> I think the problem we face here is different usage patterns of
>>>> views. There are views which process a lot of data and which are
>>>> based on documents that are updated frequently.  But they might only
>>>> be read from infrequently. These views profit from JIT computation.
>>>> However many applications use views which are infrequently updated
>>>> but often queried or searched. Such views benefit from live
>>>> updating. If an application allows searching data it nearly always
>>>> means that the data will be read more frequently than it is updated.
>>>> So in conclusion both methods (JIT and live updates) make sense for
>>>> views. But search normally only needs the live update mechanism. I
>>>> believe it should become configurable whether a view is updated
>>>> immediately after a change or only after a query takes place.
>>>> Fulltext search would always work on views with immediate updates.
>>>> The indexer would be notified about the changed results. On views
>>>> which delay updates, search would only work if the fulltext search
>>>> provides a mechanism to compare the new view results to the old ones.
>>>
>>> Just query the view with ?count=0 to trigger an update after your
>>> inserts and you have the synchronous update behaviour.
>>>
If we really do things your way that'd mean the entire database and all 
searchable views need to be reindexed completely after every single 
update. You're creating a huge amount of useless work for the indexer.

Cheers
Nils

Mime
View raw message