lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Recency weightage in Lucene
Date Mon, 19 Jun 2006 03:30:47 GMT
Using the doc-id itself as a recency metric is smart thinking. But the weight is actually a
sigmoidal function based on the oldness(i.e. currentTime-documentIndexingTime), hence just
cant use the doc-id itself. 
What is the JIRA BUGid for the lazy fiekd capability. Woudl like to know more about this feature.

thanks for the help,
-----Original Message-----
From: Chuck Williams <>
Sent: Sun, 18 Jun 2006 07:47:40 -1000
Subject: Re: Recency weightage in Lucene wrote on 06/17/2006 10:52 PM:
> I am thinking of modifying lucene's current ranking algorithm to include the 
document's recency-weightage. So that the latest modified documents gets 
preference over earlier modified documents, which makes sense for news search. 
> (I believe) To do this I have to tinker with TermScorer.score() method, and 
calculate document-score  in its while (doc < end) {..} loop. The requirement is 
that document's lastModifiedTime is stored in the doc's field, and extracting 
this value could be quite expensive for every iteration in its posting stream. 
One approach could be to store it in a separate file (like Normalization) to 
avoid field-lookup. 
> Any other ideas/suggestions.. Or if anyone has already implemented this ? 

Does recency correlate with the order in which documents are added to
you index?  If so, then perhaps you can use doc-id as a measure of
recency and thereby avoid accessing a stored field.  I'm not certain,
but based on a quick perusal of the relevant code, it appears that both
index opening and segment merging preserve the order of doc-ids.  If you
take this approach, you should verify.

If you end up needed a stored field, then be sure to use the lazy fields
capability (recently committed) to access it.


To unsubscribe, e-mail:
For additional commands, e-mail:
Check out today. Breaking news, video search, pictures, email and IM. All on demand.
Always Free.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message