lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: how to get results without getting total number of found documents?
Date Tue, 26 Sep 2006 23:26:28 GMT
Vlad,

Please check published papers on sampling inverted indexes and 
multi-level caching - this is most probably what Google and other major 
search engines use.

You can see a simple implementation of this principle in Nutch - the 
index is sorted in decreasing order by a PageRank-like score (the logic 
for this is in IndexSorter.java), and then when running a query we only 
collect top-N results, and extrapolate total numbers over the whole 
collection, assuming certain model of term distributions 
(LuceneQueryOptimizer.LimitedCollector).

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message