lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "oramas martín" <jlora...@gmail.com>
Subject A solution to HitCollector-based searches problems
Date Sun, 25 Feb 2007 23:56:19 GMT
Hello,

As you probably know, the HitCollector-based search API is not meant to work
remotely, because it will generate a RPC-callback for every non-zero score.

There is another problem with MultiSearcher-HitCollector-based search which
knows nothing about mix HitCollector based searches (not to say it has
hardcode the way to mix TopDocs for the score and for the Sort searches).
Also the ParallelMultiSearcher inherits this problems and is unable to
parallelize the HitCollector-based searcher.

A final problem with the HitCollector-based search is related to the lost of
a limit in the results, as the Hits class implements thought the
getMoreDocs() function, and lazy loading and caching of documents it does.


To solve those problems it is necessary a factory (HitCollectorSource) able
to generate collectors for single (SingleHitCollector) an multi
(MultiHitCollector) searches, and a new search method in the
Searchable interface for it. To avoid modifications to the lucene core, the
later requirement is NOT IMPLEMENTED in the library I have just created.
Instead, an ugly solution, a wrapper for those searchers
(SearcherHCSourceWrapper) and a Filter wrapper (FilterHitCollectorSource) to
carry the factory-based searches, is provided.

Each collector is based in a two steps sequence, one for collecting hits or
subsearcher results, and another for generating the final result.

Also, just in case you don't want to add a wrapper to each searcher of your
project, there is an adapted version of IndexSearcher, MultiSearcher and
ParallelMultiSearcher (only for version 2.1) modified exactly the same way
the wrapper class SearcherHCSourceWrapper does. Just put them in your
class-path (before the Lucene core jar) and you will be using the new
collector interfaces without modifying your code.

There are some unit testing (copied and adapted from the Lucene
2.1distribution).

See http://sourceforge.net/projects/lucollector/ for the jar files and the
code.

If you find it interesting to complement the Lucene project, tell me how to
put it in the contribution area.

Regards,
José L. Oramas

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message