Robert,
Some time back someone benchmarked adding an LRU cache to
TermInfosReader and were unable to see any significant overall speedup
in query processing. If you find otherwise, please submit a patch.
Java 1.4's LinkedHashMap would make the implementation of such a cache
very simple, but, unfortunately, not all Lucene users are using 1.4 yet.
Also, if you wish to retrieve all of the hits, rather than just a
portion, please use the HitCollector API rather than the Hits API. The
Hits API is optimized for applications which are only displaying a few
of the top hits.
Doug
Robert Engels wrote:
> In working with Lucene, I notice that when performing searches, it retrieves
> the documents for the same term multiple times. I think this may be because
> the Hits collection only stores a certain number of items, but would it not
> be better to just increase the size of the Hits collection, rather than
> perform the extra, relatively very expensive, read of the term docs.
>
> The following is the trace output from Lucene performing 2 single term
> searches, and a multiple term search: (notice that in each case, the
> documents for a term are asked for twice).
>
> expression = +epson, query = +text:epson
> findTermInfo() text:epson, time = 0
> SearchTermDocs, seek() on text:epson
> SearchTermDocs, seek() on text:epson [cached]
> find, hits = 224, query time = 16, doc (150) time = 15, total time = 31
>
> expression = +printer, query = +text:printer
> findTermInfo() text:printer, time = 16
> SearchTermDocs, seek() on text:printer
> SearchTermDocs, seek() on text:printer [cached]
> find, hits = 5358, query time = 62, doc (150) time = 282, total time = 344
>
> expression = +epson +printer, query = +text:epson +text:printer
> SearchTermDocs, seek() on text:epson [cached]
> SearchTermDocs, seek() on text:printer [cached]
> SearchTermDocs, seek() on text:epson [cached]
> SearchTermDocs, seek() on text:printer [cached]
> find, hits = 175, query time = 15, doc (150) time = 47, total time = 62
>
> In order to limit the performance hit, or implementation caches the returned
> docs within a query (the [cached] tag), but it seems the issue would be
> better addressed by the Lucene engine.
>
> Any thoughts on this?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
|