lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yannis Pavlidis" <ypavli...@me.dium.com>
Subject boost freshness instead of sorting
Date Thu, 28 Aug 2008 16:12:33 GMT

Hi,

I am trying to boost the freshness of some of our documents in the index using the most efficient
way (i.e. if 2 news stories have the same score based on the content then I want to promote
the one that was created last)

I have tried several techniques which do not seems to be performing that well or do not seem
very efficient. 
- range query 
- custom query
- sorting

While looking at the archives I came across this email: http://www.gossamer-threads.com/lists/lucene/java-user/43457
where "Andrzej Bialecki" proposes the addition of a column related with days (months, etc)
and add a "1" for each day/month that has passed from the epoch. I tried his solution and
it does not seem to performing that well. The reason (unless my math have completely failed
me) is because the boost that this new field provides is always the same.

For example i tried the above idea in an index that contained 4 documents. The days were set
from 1-4 respectively.
The field weight (using the explain from Luke) is calculated as: field weight = tf * idf *
fieldNorm. The idf is the same for all the documents. tf is defined as sqrt (times the term
in the doc) and field Norm = field Boost * 1 / sqrt (terms in field).

In my example (and I cannot find how it would be different in other scenarios) the field weight
is the same for all the documents (and equal to idf) because the sqrt (times the term in doc)
* 1/sqrt (terms in field) = 1.

I would appreciate any feed back on that and also any other way you would consider in addressing
the issue of boost the freshness of documents.

Thank you in advance,

Yannis.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message