lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <>
Subject Re: Is TopDocCollector's collect() implementation correct?
Date Sun, 22 Mar 2009 20:22:22 GMT
How about if we introduce an abstract ScoringCollector (about the name
later) which implements topDocs() and getTotalHits() and there will be
several implementations of it, such as: TopScoreDocCollector, which sorts
the documents by their score, in descending order only, TopFieldDocCollector
- for sorting by fields, and additional sort-by collectors.

All these can share the topDocs and getTotalHits methods. Nadav - this is
just like your proposed interface, but I would like to propose an abstract
class which will implement the common functionality. The only non-common
functionality is collect(), and this one will be implemented by subclasses.
That way, all of these can be of the same type, which makes it easier to
write search applications who offer the user to sort results based on other
attributes than just score.
This class can have a protected c'tor which accepts a PQ and nothing more.
It will also make its PQ and totalHits protected.

About the name - TopDocCollector or TopDocsCollector is the perfect name for
this class. But the first one is already taken and the second one will just
confuse users (with the first one). Unless we can decide to make
TopDocCollector abstract n 2.9, instead of just removing it?
Or if you are not happy with ScoringCollector, please provide a better name.

The more I think about it, I realize that my intentions with 1356 were to
make TopDocCollector a superclass for all scoring documents, and sharing its
PQ led to the problem I reported in this thread. Perhaps it was better than
to define that abstract class .. but better later than never.

What do you think?


On Sun, Mar 22, 2009 at 3:16 PM, Nadav Har'El <>wrote:

> On Sat, Mar 21, 2009, Michael McCandless wrote about "Re: Is
> TopDocCollector's collect() implementation correct?":
> >
> > I think I'd lean towards a third solution: tighten up
> > TopScoreDocCollector (make it final, remove ability to change its PQ,
> > make things private) and have it focus on high performance collection
> > by score.
> The problem I see is this: TopDocCollector currently does not implement
> any sort of interface for reading its output. This means that if you create
> a completely different class implementing, for example, a different
> sorting criteria (e.g., sort by date), there will be no base class or
> interface that you could use for both of them, to allow changing the sort
> criterion easily at run time. On the other hand, with the existing
> collector,
> you can subclass TopDocCollector, and use that as the common base class.
> If we're already creating a new TopScoreDocCollector (when was it added?
> I must have been dozing off while this happened...) perhaps we can create
> a read interface for it (with the getTotalHits() and topDocs() methods),
> and
> have this class implement that interface? Then, indeed, nobody will have
> any reason to extend the TopScoreDocCollector class, and it can be final.
> --
> Nadav Har'El                        |        Sunday, Mar 22 2009, 26 Adar
> 5769
> IBM Haifa Research Lab
>  |-----------------------------------------
>                                    |"Did you sleep well?" "No, I made a
>           |couple of mistakes."
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message