lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Limiting Result-Count
Date Thu, 29 Jun 2006 21:32:37 GMT
Otis Gospodnetic wrote:
> Try using HitCollector and break out of it when you collect enough documents.  My guess
is that if you are not doing anything crazy with Hits (like looping through the all) this
won't be that much faster than using Hits.
>   

Well, in practice it does help - see the way this is done in Nutch 
(src/java/org/apache/nutch/searcher/LuceneQueryOptimizer$LimitedCollector). 
Performance-wise, with large indexes this makes a big difference.

The problem that you need to address, though, is how usable are partial 
results, i.e. if you are reasonably sure that by collecting only partial 
results you are not missing important hits, which would have been found 
had you let the search collect all results ... This facility in Nutch is 
used only if posting lists are sorted by decreasing document importance 
(see IndexSorter for details), so that we collect first the most highly 
ranking hits, and skip low ranking ones.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message