incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Lazy Fulltext Search
Date Tue, 15 Apr 2008 12:05:38 GMT

On Apr 15, 2008, at 02:01, Nils Adermann wrote:
> Hi,
>
> I agree with Søren that this is not necessarily a good idea. It is  
> not trivial for an indexer to figure out which view results changed.  
> One method to so is storing all indexed view results and then  
> comparing them to the updated view once the indexer is called. This  
> is a needless waste of resources. Updating the view index based on  
> changed documents is even more difficult. You would have to  
> recompute the view at least partially to find out which view results  
> changed. Given the reduce step this means that any number of  
> documents, including unchanged ones could be involved. This creates  
> a lot of work.

Yeah, but it doesn't actually matter who does the work :) So we rather  
keep that out of CouchDB.


> I think the problem we face here is different usage patterns of  
> views. There are views which process a lot of data and which are  
> based on documents that are updated frequently.  But they might only  
> be read from infrequently. These views profit from JIT computation.  
> However many applications use views which are infrequently updated  
> but often queried or searched. Such views benefit from live  
> updating. If an application allows searching data it nearly always  
> means that the data will be read more frequently than it is updated.  
> So in conclusion both methods (JIT and live updates) make sense for  
> views. But search normally only needs the live update mechanism. I  
> believe it should become configurable whether a view is updated  
> immediately after a change or only after a query takes place.  
> Fulltext search would always work on views with immediate updates.  
> The indexer would be notified about the changed results. On views  
> which delay updates, search would only work if the fulltext search  
> provides a mechanism to compare the new view results to the old ones.

Just query the view with ?count=0 to trigger an update after your  
inserts and you have the synchronous update behaviour.

>
>
> Cheers
> Nils
>
> Jan Lehnardt wrote:
>>
>> On Apr 12, 2008, at 12:06, Søren Hilmer wrote:
>>> Hi
>>>
>>> Have you read Chris' response about letting the view engine call  
>>> the indexer,
>>> as it has the information needed for the indexer? As I understand  
>>> the idea,
>>> it will essentially keep the fulltext indexer and the views in sync.
>>>
>>> I like this idea and I believe the code for the indexer would be  
>>> much simpler
>>> and efficient.
>>>
>>> Also as the shift goes towards indexing views and not documents,  
>>> it makes
>>> sense that it is the View engine that triggers the indexer, right?
>>
>> The only problem here is that views are changed, when they are  
>> being queried and not when documents are added. So you could end up  
>> with a lot of not-indexed data because your view hasn't been  
>> queried. That can be worked around, but I don't think it makes  
>> things any easier :)
>>
>> The design of the update notification is intentionally simple. We  
>> expect the clients (the Indexer in this case) to be smart. We  
>> believe that this makes the server code is more robust in that way.
>>
>>
>>> I have to study the View engine, if I am to provide any code for  
>>> this, though
>>> (provided consensus blows in this direction).
>>>
>>> Have fun
>>>  Søren
>>> On Friday 11 April 2008 13:26, Jan Lehnardt wrote:
>>>> On Apr 11, 2008, at 08:55, Søren Hilmer wrote:
>>>>> Hi Jan
>>>>>
>>>>> It certainly would simplify configuration, allthough the
>>>>> DbUpdateNotificationProcess setting ought to be retained as it is
>>>>> potentially usefull for other stuff than indexing (can you have  
>>>>> more
>>>>> than
>>>>> one of these, setup?)
>>>>
>>>> No, the update searcher will stay! :-)
>>>>
>>>>> I am also worried about responsetimes for searching, potentially  
>>>>> the
>>>>> indexing can take considerable time. With the current approach
>>>>> indexing
>>>>> can be done off peak hours and only searching is done at prime  
>>>>> time.
>>>>
>>>> Right, if you want to be conservative with resources, you might  
>>>> want
>>>> togo
>>>> with my approach at the expense of possibly higher response times  
>>>> the
>>>> first time things are searched for (as it is with views). I just
>>>> wanted to make
>>>> available my idea that fulltext indexing could be modelled after  
>>>> how
>>>> views
>>>> work, in case this is useful for a specific scenario.
>>>>
>>>> Cheers
>>>> Jan
>>>> -- 
>>>>
>>>>> Have fun
>>>>> Søren
>>>>> -- 
>>>>> Søren Hilmer, M.Sc., M.Crypt.
>>>>> wideTrail            Phone: +45 25481225
>>>>> Pilevænget 41        Email: sh@widetrail.dk
>>>>> DK-8961  Allingåbro  Web: www.widetrail.dk
>>>>>
>>>>> On Thu, April 10, 2008 23:32, Jan Lehnardt wrote:
>>>>>> Heya,
>>>>>> while thinking more about the fulltext implementation, I began to
>>>>>> wonder why we don't model it after the view engine.
>>>>>>
>>>>>> At the moment, we have an Indexer waiting for update  
>>>>>> notifications
>>>>>> and
>>>>>> polling CouchDB for changes and a separate mechanism to  
>>>>>> register a
>>>>>> fulltext query Searcher, that looks up things in the index.
>>>>>>
>>>>>> My proposed architectural change would be to trigger the  
>>>>>> Indexer from
>>>>>> the Searcher module when a request comes in, just like views  
>>>>>> work.
>>>>>> This would delay the creation of fulltext indexes until they are
>>>>>> actually needed.
>>>>>>
>>>>>> The possible drawback though is, that when building the fulltext
>>>>>> index
>>>>>> is rather slow, old-style pre-calculation might be more feasible.
>>>>>> View
>>>>>> deal with that by requiring frequent requests (possibly cron-ed).
>>>>>>
>>>>>> This is not a proposal or anything, just a thought I wanted to  
>>>>>> share
>>>>>> with those who work on fulltext integration.
>>>>>>
>>>>>> If you have any input on this, please let us know ;)
>>>>>>
>>>>>> Cheers
>>>>>> Jan
>>>>>> -- 
>>>
>>> -- 
>>> Søren Hilmer, M.Sc., M.Crypt.
>>> wideTrail            Phone:    +45 25481225
>>> Pilevænget 41        Email:    sh@widetrail.dk
>>> DK-8961  Allingåbro    Web:    www.widetrail.dk
>>>
>>
>
>


Mime
View raw message