lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Hill <p...@metajure.com>
Subject IndexSearcher.search(query, filter, collector) considered less efficient
Date Fri, 08 Jun 2012 17:32:52 GMT
I noticed today that my code calls
IndexSearcher.search (Query query, Filter filter, Collector collector)
But also noticed that the DOCs says

"Applications should only use this if they need all of the matching documents. The high-level
search API (Searcher.search(Query, Filter, int)
) is usually more efficient, as it skips non-high-scoring hits."
   http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/core/org/apache/lucene/search/IndexSearcher.html#searchAfter%28org.apache.lucene.search.ScoreDoc,%20org.apache.lucene.search.Query,%20int%29
Which makes complete sense since I didn't provide it with any count limit.
My original, but apparently inefficient call is:
            searcher.search(userQuery, securityFilter, dedupingCollector);
The userQuery is really an enhanced query based on what the user entered, not really the usersQuery.
The duplicateCollector uses one fieldCache (FieldCache.DEFAULT.getStrings(reader, deDupField)
to work out which ones to collect and which ones to reject, saving a list of 1st occurrences
of documents.
I don't think I can use the contrib DuplicateFilter, because my duplicates are not guaranteed
to be in the same index segment.

So am I being misled by my interpretation of the JavaDoc comment, even though I really DON'T
"need all matching documents" or is there some way to work a count limit and a flitering into
the whole chain of filters and collectors.

-Paul

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message