lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karsten Konrad" <Karsten.Kon...@xtramind.com>
Subject AW: Problem with long run IndexSearcher
Date Mon, 19 May 2003 14:26:07 GMT

Hi,

interesting problem, quite frequent in web applications with Lucene.

>>
Sorry for posting a FAQ. I did read the documentation quite carefully, 
but I did miss the point where this question had been documented.

Any new workaround since 2001, when the FAQ was written?

Thank your for your attention,

Giulio Cesare Solaroli
>>

As you know now, your instance of index searcher will ignore all newly indexed 
documents. However, creating a new index  searcher for each search seems a bit 
too expensive to be done for every new document. That would slow down everything 
in your case, and for many other such applications.

So you could:

(1) create a new index searcher from time to time and replace it with
the old one. A new searcher every 15 Minutes will not harm your application's
speed much; in this time, only about 40 documents would be invisible for your 
application if the stream of new docs is constant. 

(2) create a new index searcher only if a search query appears and any new
document (or some predefined minimum number of new documents) have been indexed 
since the last query.

Strategy (2) makes most sense when the indexing appears unfrequently and
mostly outside the time when users search. For instance, if your application
indexes new documents only during the night, but your users search during the
day, you could go for (3) easily. A new docukment from time to time does
not hurt. 

As you probably index in batch mode and optimize then, the best time to switch 
to a new searcher would be each time after optimizing. 

Please do not forget that you can not work with search results any more where you 
have closed the IndexSearcher they have been created with!

Regards,

Karsten



-----Urspr√ľngliche Nachricht-----
Von: Giulio Cesare Solaroli [mailto:slrgcsa@ibn-italy.com]
Gesendet: Montag, 19. Mai 2003 15:38
An: lucene-dev@jakarta.apache.org
Betreff: Problem with long run IndexSearcher


Hi all,

first let me express my compliments for Lucene.
I have been up for a full week-end to double check the results I was 
having because I couldn't belive what I saw; with a stupid application 
I could index DB data at a sustained rate of 50 documents per second.

Now we have more that 2 millions documents indexed and the performance 
are still excellent; our main bottle neck is still the DB.

Our situation:
- we are indexing new documents at a sustained rate (an average of 
40.000 new documents a day);
- we have written a small xmlRpc server in Java to search the index 
from other applications.

The xmlRpc server creates a single instance of IndexSearcher a reuse it 
for each query issued.
For each request, a new Query object is created and the documents found 
are returned to the client.

The problem we are seeing is that the documents indexed after the 
xmlRpc server is started will not be found until the server is 
restarted.

Is this our foult, or the way IndexSearcher should work?

What is the best way to keep the IndexSearcher up to date with the 
updated index?

Thanks for your attention,

Giulio Cesare Solaroli


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message