lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: lucene algorithm ?
Date Thu, 26 Apr 2012 14:21:58 GMT
On 26/04/2012 09:49, Uwe Schindler wrote:

> There are possibilities to truncate those lists, but this is not implemented
> in Lucene core. The main problem with Lucene's segmented index structure is,
> that you cannot early exit, because the very last document in the posting
> list could be one with a very high score. Techniques like index pruning or
> index sorting can optimize all this, but need index preprocessing and loose
> updatability of the index while in use.
> The Top-N result collection is optimized in Lucene (item #3), to throw away
> all collected documents, where the score is never be able to get into the
> top-n priority queue (too small score, once the queue is full). But the
> score has to be calculated.

This paper presents yet another option: keep a value that represents 
max. impact of postings in a block and then skip whole blocks if the max 
impact is too low to get docs within the block in the queue:

Implementing this in Lucene would require changing the iterators so that 
they can take a threshold value, i.e. the current lowest score in the 
queue, so that nextDoc() and advance() could skip whole blocks when 
their max impact is lower than the current lowest score.

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message