lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: strange lucene search behavior?
Date Tue, 20 Jan 2004 17:29:57 GMT
Robert,

Some time back someone benchmarked adding an LRU cache to 
TermInfosReader and were unable to see any significant overall speedup 
in query processing.  If you find otherwise, please submit a patch. 
Java 1.4's LinkedHashMap would make the implementation of such a cache 
very simple, but, unfortunately, not all Lucene users are using 1.4 yet.

Also, if you wish to retrieve all of the hits, rather than just a 
portion, please use the HitCollector API rather than the Hits API.  The 
Hits API is optimized for applications which are only displaying a few 
of the top hits.

Doug

Robert Engels wrote:
> In working with Lucene, I notice that when performing searches, it retrieves
> the documents for the same term multiple times. I think this may be because
> the Hits collection only stores a certain number of items, but would it not
> be better to just increase the size of the Hits collection, rather than
> perform the extra, relatively very expensive, read of the term docs.
> 
> The following is the trace output from Lucene performing 2 single term
> searches, and a multiple term search: (notice that in each case, the
> documents for a term are asked for twice).
> 
> expression = +epson, query = +text:epson
> findTermInfo() text:epson, time = 0
> SearchTermDocs, seek() on text:epson
> SearchTermDocs, seek() on text:epson [cached]
> find, hits = 224, query time = 16, doc (150) time = 15, total time = 31
> 
> expression = +printer, query = +text:printer
> findTermInfo() text:printer, time = 16
> SearchTermDocs, seek() on text:printer
> SearchTermDocs, seek() on text:printer [cached]
> find, hits = 5358, query time = 62, doc (150) time = 282, total time = 344
> 
> expression = +epson +printer, query = +text:epson +text:printer
> SearchTermDocs, seek() on text:epson [cached]
> SearchTermDocs, seek() on text:printer [cached]
> SearchTermDocs, seek() on text:epson [cached]
> SearchTermDocs, seek() on text:printer [cached]
> find, hits = 175, query time = 15, doc (150) time = 47, total time = 62
> 
> In order to limit the performance hit, or implementation caches the returned
> docs within a query (the [cached] tag), but it seems the issue would be
> better addressed by the Lucene engine.
> 
> Any thoughts on this?
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message