lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Optimizing unordered queries
Date Fri, 26 Jun 2009 15:06:25 GMT
On Thu, Jun 25, 2009 at 10:11 PM, Nigel<nigelspleen@gmail.com> wrote:

> Currently we're (perhaps naively) doing the equivalent of
> query.weight(searcher).scorer(reader).score(collector).  Obviously there's a
> certain amount of unnecessary calculation that results from this if you
> don't care about sorting.  Are there any general recommendations for
> unordered searching?  (We already omit norms.)

As of 2.9, scoring is optional with the new Collector.  Ie, if your
Collector doesn't call scorer.score() then the score won't be
computed.

Also, 2.9 will enable out-of-docID-order scoring if your
Collector.acceptsDocsOutOfOrder returns true, but currently it's
disjunction only (plus up to 32 prohibited terms) that will see a
speedup from this.

What kind of queries are you running?

I assume your Collector simply aggregates the list of docIDs hit?  Ie
no sorting, priority queue, etc., in it.

> (More details: Of particular interest are things that access the TermInfos,
> since that's the major source of RAM usage: if a smaller number of TermInfos
> were needed then we could perhaps use an aggressive index divisor setting to
> save RAM without a performance penalty.  For example, I was thinking about a
> custom Similarity implementation that skipped the idf calculations, since
> those have to hit the TermInfos.)

Unfortunately the TermInfos must still be hit to look up the
freq/proxOffset in the postings files.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message