lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector
Date Wed, 29 Apr 2009 20:45:30 GMT


Michael McCandless commented on LUCENE-1593:

I think we should have an issue handling interfaces deprecation in general for 2.9, since
just deprecating Weight does not solve it. You'd have to deprecate* methods
which accept Weight, but Searchable is an interface, so you might want to deprecate it entirely
and create an AbstractSearchable? That I think also deserves its own thread, don't you think?
Yes, and this presumably depends on the outcome of the first "how much can change in 3.0"

bq. I thought that perhaps we can make the following change

Once again I'm lacking clarity.... there are many related possible
improvements to searching:

  * This "top" vs "not-top" scorer difference being more explicit

  * Merging Query/Filter (LUCENE-1518), allowing Filter as a clause to
    BooleanQuery (LUCENE-1345): it still feels like Query should be a
    subclass of Filter, since Query "simply" adds scoring to a

  * Pushing random-access filters down to the TermScorers, and
    pre-multiplying in deletes when posible (LUCENE-1536)

  * Similarly pushing "bottomValue" down to TermScorers for
    field-sorted searching

  * Have a single query make a "cheap" and "expensive" scorer so that
    all "cheap" scorers are checked first and only if they pass are
    expensive ones checked (LUCENE-1252)

  * The possible "Scorer.check" (LUCENE-1614) to test if a doc passes
    w/o next'ing

  * For AND scoring, picking carefully in what order to test the
    iterators, maybe also choosing when to use "check" instead of
    "advance" for some.

  * "Multiplying out" compound queries.  EG +X (A OR B) makes a nested
    BooleanQuery; multiplying it out and then somehow sharing a single
    iterator for X's TermScorer, should give better performance.
    Other "structural" optimizations could apply.

  * Far-out, and not really affecting APIs, but still related: source
    code specialization (LUCENE-1594) to get speedups

I'm not yet sure what steps to take now (and how) vs later...

> Optimizations to TopScoreDocCollector and TopFieldCollector
> -----------------------------------------------------------
>                 Key: LUCENE-1593
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>         Attachments: LUCENE-1593.patch,
> This is a spin-off of LUCENE-1575 and proposes to optimize TSDC and TFC code to remove
unnecessary checks. The plan is:
> # Ensure that IndexSearcher returns segements in increasing doc Id order, instead of
> # Change TSDC and TFC's code to not use the doc id as a tie breaker. New docs will always
have larger ids and therefore cannot compete.
> # Pre-populate HitQueue with sentinel values in TSDC (score = Float.NEG_INF) and remove
the check if reusableSD == null.
> # Also move to use "changing top" and then call adjustTop(), in case we update the queue.
> # some methods in Sort explicitly add SortField.FIELD_DOC as a "tie breaker" for the
last SortField. But, doing so should not be necessary (since we already break ties by docID),
and is in fact less efficient (once the above optimization is in).
> # Investigate PQ - can we deprecate insert() and have only insertWithOverflow()? Add
a addDummyObjects method which will populate the queue without "arranging" it, just store
the objects in the array (this can be used to pre-populate sentinel values)?
> I will post a patch as well as some perf measurements as soon as I have them.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message