lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "oramas martín" <>
Subject Re: A solution to HitCollector-based searches problems
Date Thu, 08 Mar 2007 19:23:45 GMT

I have just added some search implementation samples based on this collector
solution, to easy the use and understanding or it:

    - KeywordSearch: Extract the terms (and frequency) found in a list of
                     from the results of a query/filter search

    - GoogleSearch: Return an ordered search result grouped a la Google,
                    on the terms found in a list of fields

    - GetFieldNamesOp: Operation to mimic the getFieldNames method of
                       IndexReader but using a searcher. With it, it is
                       to explore the fields of remote indexes.

See for the source code (

José L. Oramas

On 2/26/07, oramas martín <> wrote:
> Hello,
> As you probably know, the HitCollector-based search API is not meant to
> work remotely, because it will generate a RPC-callback for every non-zero
> score.
> There is another problem with MultiSearcher-HitCollector-based search
> which knows nothing about mix HitCollector based searches (not to say it has
> hardcode the way to mix TopDocs for the score and for the Sort searches).
> Also the ParallelMultiSearcher inherits this problems and is unable to
> parallelize the HitCollector-based searcher.
> A final problem with the HitCollector-based search is related to the lost
> of a limit in the results, as the Hits class implements thought the
> getMoreDocs() function, and lazy loading and caching of documents it does.
> To solve those problems it is necessary a factory (HitCollectorSource)
> able to generate collectors for single (SingleHitCollector) an multi
> (MultiHitCollector) searches, and a new search method in the
> Searchable interface for it. To avoid modifications to the lucene core, the
> later requirement is NOT IMPLEMENTED in the library I have just created.
> Instead, an ugly solution, a wrapper for those searchers
> (SearcherHCSourceWrapper) and a Filter wrapper (FilterHitCollectorSource) to
> carry the factory-based searches, is provided.
> Each collector is based in a two steps sequence, one for collecting hits
> or subsearcher results, and another for generating the final result.
> Also, just in case you don't want to add a wrapper to each searcher of
> your project, there is an adapted version of IndexSearcher, MultiSearcher
> and ParallelMultiSearcher (only for version 2.1) modified exactly the same
> way the wrapper class SearcherHCSourceWrapper does. Just put them in your
> class-path (before the Lucene core jar) and you will be using the new
> collector interfaces without modifying your code.
> There are some unit testing (copied and adapted from the Lucene 2.1distribution).
> See for the jar files and the
> code.
> If you find it interesting to complement the Lucene project, tell me how
> to put it in the contribution area.
> Regards,
> José L. Oramas

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message