lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <reng...@ix.netcom.com>
Subject strange lucene search behavior?
Date Fri, 16 Jan 2004 16:44:44 GMT
In working with Lucene, I notice that when performing searches, it retrieves
the documents for the same term multiple times. I think this may be because
the Hits collection only stores a certain number of items, but would it not
be better to just increase the size of the Hits collection, rather than
perform the extra, relatively very expensive, read of the term docs.

The following is the trace output from Lucene performing 2 single term
searches, and a multiple term search: (notice that in each case, the
documents for a term are asked for twice).

expression = +epson, query = +text:epson
findTermInfo() text:epson, time = 0
SearchTermDocs, seek() on text:epson
SearchTermDocs, seek() on text:epson [cached]
find, hits = 224, query time = 16, doc (150) time = 15, total time = 31

expression = +printer, query = +text:printer
findTermInfo() text:printer, time = 16
SearchTermDocs, seek() on text:printer
SearchTermDocs, seek() on text:printer [cached]
find, hits = 5358, query time = 62, doc (150) time = 282, total time = 344

expression = +epson +printer, query = +text:epson +text:printer
SearchTermDocs, seek() on text:epson [cached]
SearchTermDocs, seek() on text:printer [cached]
SearchTermDocs, seek() on text:epson [cached]
SearchTermDocs, seek() on text:printer [cached]
find, hits = 175, query time = 15, doc (150) time = 47, total time = 62

In order to limit the performance hit, or implementation caches the returned
docs within a query (the [cached] tag), but it seems the issue would be
better addressed by the Lucene engine.

Any thoughts on this?


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message