lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <ch...@manawiz.com>
Subject Re: Recency weightage in Lucene
Date Sun, 18 Jun 2006 17:47:40 GMT


PrasenjitM@aol.com wrote on 06/17/2006 10:52 PM:
> I am thinking of modifying lucene's current ranking algorithm to include the document's
recency-weightage. So that the latest modified documents gets preference over earlier modified
documents, which makes sense for news search. 
>
> (I believe) To do this I have to tinker with TermScorer.score() method, and calculate
document-score  in its while (doc < end) {..} loop. The requirement is that document's
lastModifiedTime is stored in the doc's field, and extracting this value could be quite expensive
for every iteration in its posting stream. One approach could be to store it in a separate
file (like Normalization) to avoid field-lookup. 
>
> Any other ideas/suggestions.. Or if anyone has already implemented this ? 
>   

Does recency correlate with the order in which documents are added to
you index?  If so, then perhaps you can use doc-id as a measure of
recency and thereby avoid accessing a stored field.  I'm not certain,
but based on a quick perusal of the relevant code, it appears that both
index opening and segment merging preserve the order of doc-ids.  If you
take this approach, you should verify.

If you end up needed a stored field, then be sure to use the lazy fields
capability (recently committed) to access it.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message