lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Sorted Index
Date Sat, 27 Oct 2007 09:12:00 GMT
John Patterson wrote:
> 
> 
> Yonik Seeley wrote:
>> On 10/26/07, John Patterson <jdp2000@gmail.com> wrote:
>> Most things in an inverted index are sorted (terms, matching document
>> ids, term positions within a field, etc).  Can you be more specific
>> about what you are trying to accomplish?
>>
> 
> Sorry, I mean sorting the documents in an order other than the order they
> are added.  The my search could just return docs in index order.  For the
> most common sorting I could collect only the first x docs and then
> short-circuit the search like we previously discussed.

These questions already have an answer in Nutch (see the 
org.apache.nutch.indexer.IndexSorter, and 
org.apache.nutch.searcher.LuceneQueryOptimizer$LimitedCollector).

> 
> I was wondering if it is possible to apply a sort at merge time?

One method that I'm familiar with is the following: you can split the 
result set into several large-ish bins, and apply arbitrary sorting 
methods within each bin. Studies show that if you pick the right bin 
size, users will rarely look into the second and the following bins, so 
the task is reduced to the sorting of the first bin, e.g. 100 top 
scoring docs.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message