lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Is TopDocCollector's collect() implementation correct?
Date Sat, 21 Mar 2009 09:51:01 GMT

I think I'd lean towards a third solution: tighten up
TopScoreDocCollector (make it final, remove ability to change its PQ,
make things private) and have it focus on high performance collection
by score.  This is the default collector for Lucene searches so I think
keeping it high performance (not adding additional method calls nor
boolean checks) is important.  You choose this for performance

Then introduce a new hit collector that makes subclassing / changing
the PQ easier. You choose this one for flexibility.

And 2.9 is a great chance to do this since we've deprecated the old
class (TopDocCollector) in favor of TopScoreDocCollector.

Mike

Shai Erera wrote:

> Thanks Chris/Hoss (not sure who sent the original reply).
>
> I don't like calling pq.lessThan, as pq.insert and  
> pq.insertWithOverflow call it anyway internally and since it would  
> add a method call (something that was tried to be avoided in the  
> current implementation), I prefer the code I proposed below.
>
> BTW, I introduced 1356, so I take full responsibility on this  
> overlooking. The main reason for 1356 was to allow creating  
> extensions of TopDocCollector so they can be of the same type, and  
> share the topDocs() and totalHIts() implementations.
>
> I can file an issue. Any other comments?
>
> Shai
>
> On Sat, Mar 21, 2009 at 3:48 AM, Chris Hostetter <hossman_lucene@fucit.org 
> > wrote:
>
> (resending msg from earlier today during @apache mail outage -- i  
> didn't get a copy from the list, so i'm assuming no one did)
>
>
> ---------- Forwarded message ----------
> Date: Fri, 20 Mar 2009 15:29:13 -0700 (PDT)
>
> : TopDocCollector's (TDC) implementation of collect() seems a bit  
> problematic
> : to me.
>
> This code isn't an area i'm very familiar with, but your assessment  
> seems
> correct ... it looks like when LUCENE-1356 introduced the ability to
> provide a PriorityQueue to the constructor, the existing  
> optimization when
> the score was obvoiusly too low was overlooked.
>
> It looks like this same bug got propogated to TopScoreDocCollector
> when it was introduced as well.
>
> : Introduce in TDC a private boolean which signals whether the  
> default PQ is
>
> : used or not. If it's not used, don't do the 'else if' at all. If  
> it is used,
> : then the 'else if' is safe. Then code could look like:
>
> my vote would just be to change the ">=" comarison to a hq.lessThan  
> call
> ... but i can understand how your proposal might be more efficient  
> -- I'll
> let the performance experts fight it out ... but i definitely think  
> you
> should fil a bug.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message