lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "oramas martín" <jlora...@gmail.com>
Subject Re: A solution to HitCollector-based searches problems
Date Thu, 08 Mar 2007 19:23:45 GMT
Hello,

I have just added some search implementation samples based on this collector
solution, to easy the use and understanding or it:

    - KeywordSearch: Extract the terms (and frequency) found in a list of
fields
                     from the results of a query/filter search

    - GoogleSearch: Return an ordered search result grouped a la Google,
based
                    on the terms found in a list of fields

    - GetFieldNamesOp: Operation to mimic the getFieldNames method of
                       IndexReader but using a searcher. With it, it is
possible
                       to explore the fields of remote indexes.

See http://sourceforge.net/projects/lucollector/ for the source code (
lu-collector-src-sampleop-0.8.zip).

Regards,
José L. Oramas

On 2/26/07, oramas martín <jloramas@gmail.com> wrote:
>
>
> Hello,
>
> As you probably know, the HitCollector-based search API is not meant to
> work remotely, because it will generate a RPC-callback for every non-zero
> score.
>
> There is another problem with MultiSearcher-HitCollector-based search
> which knows nothing about mix HitCollector based searches (not to say it has
> hardcode the way to mix TopDocs for the score and for the Sort searches).
> Also the ParallelMultiSearcher inherits this problems and is unable to
> parallelize the HitCollector-based searcher.
>
> A final problem with the HitCollector-based search is related to the lost
> of a limit in the results, as the Hits class implements thought the
> getMoreDocs() function, and lazy loading and caching of documents it does.
>
>
> To solve those problems it is necessary a factory (HitCollectorSource)
> able to generate collectors for single (SingleHitCollector) an multi
> (MultiHitCollector) searches, and a new search method in the
> Searchable interface for it. To avoid modifications to the lucene core, the
> later requirement is NOT IMPLEMENTED in the library I have just created.
> Instead, an ugly solution, a wrapper for those searchers
> (SearcherHCSourceWrapper) and a Filter wrapper (FilterHitCollectorSource) to
> carry the factory-based searches, is provided.
>
> Each collector is based in a two steps sequence, one for collecting hits
> or subsearcher results, and another for generating the final result.
>
> Also, just in case you don't want to add a wrapper to each searcher of
> your project, there is an adapted version of IndexSearcher, MultiSearcher
> and ParallelMultiSearcher (only for version 2.1) modified exactly the same
> way the wrapper class SearcherHCSourceWrapper does. Just put them in your
> class-path (before the Lucene core jar) and you will be using the new
> collector interfaces without modifying your code.
>
> There are some unit testing (copied and adapted from the Lucene 2.1distribution).
>
> See http://sourceforge.net/projects/lucollector/ for the jar files and the
> code.
>
> If you find it interesting to complement the Lucene project, tell me how
> to put it in the contribution area.
>
> Regards,
> José L. Oramas
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message