lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector
Date Mon, 27 Apr 2009 21:46:30 GMT


Michael McCandless commented on LUCENE-1593:

Some extensions of PQ may not know how to construct such a sentinel object. Consider ComparablePQ,
which assumes all items are Comparable. Unlike HitQueue, it does not know what will be the
items stored in the queue. But ... I guess it can return a Comparable which always prefers
the other element ... so maybe that's not a good example.
I just have the feeling that a setter method will give us more freedom, rather than having
to extend PQ just for that ...

Such extensions shouldn't use a sentinel?

Various things spook me about the separate method: one could easily
pass in bad sentinels, and then the queue is in an invalid state; the
method can be called at any time (whereas the only time you should do
this is on init); you could pass in wrong-sized array; the API is
necessarily public (whereas with getSentinel() it'd be protected).

We can mull it over some more... sleep on it ;)

bq. Right ... it's too late for me

I've been starting to wonder if you are a robot...

bq. hmm I wonder why this wasn't done so far?

I don't know!  Seems like a simple optimization.  So we don't need
start/end (now at least).

bq. oIt's just that maintaining that class becomes more and more problematic.

I completely agree: this is the tradeoff we have to mull.  But I like
that all these classes are private (it hides the fact that there are
12 concrete impls).

I think I'd lean towards the 12 impls now.  They are tiny classes.

bq. But I wonder from where would we control it

Hmm.. yeah good point.  The only known place in Lucene's core that
visits hits out of order is BooleanScorer.  But presumably an external
Query somewhere may provide a Scorer that does things out of order
(maybe Solr does?), and so technically making the core collectors not
break ties by docID by default is a break in back-compat.

Maybe we should add a "docsInOrder()" method to Scorer?  By default it
returns false, but we fix that to return true for all core Lucene
queries?  And then IndexSearcher consults that to decide whether it
can do this?

> Optimizations to TopScoreDocCollector and TopFieldCollector
> -----------------------------------------------------------
>                 Key: LUCENE-1593
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>         Attachments: LUCENE-1593.patch,
> This is a spin-off of LUCENE-1575 and proposes to optimize TSDC and TFC code to remove
unnecessary checks. The plan is:
> # Ensure that IndexSearcher returns segements in increasing doc Id order, instead of
> # Change TSDC and TFC's code to not use the doc id as a tie breaker. New docs will always
have larger ids and therefore cannot compete.
> # Pre-populate HitQueue with sentinel values in TSDC (score = Float.NEG_INF) and remove
the check if reusableSD == null.
> # Also move to use "changing top" and then call adjustTop(), in case we update the queue.
> # some methods in Sort explicitly add SortField.FIELD_DOC as a "tie breaker" for the
last SortField. But, doing so should not be necessary (since we already break ties by docID),
and is in fact less efficient (once the above optimization is in).
> # Investigate PQ - can we deprecate insert() and have only insertWithOverflow()? Add
a addDummyObjects method which will populate the queue without "arranging" it, just store
the objects in the array (this can be used to pre-populate sentinel values)?
> I will post a patch as well as some perf measurements as soon as I have them.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message