lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Boosting more-recent documents in an index
Date Wed, 14 May 2008 15:43:53 GMT
Don't ask me why this occurred to me, since I'm working on a
completely different project... Mostly, this is intended to have
folks who really understand the scoring algorithms chime in and
tell me it's a silly idea <G>.

We've seen multiple threads asking the question: "How can I
cause more-recent documents to be scored higher?" and
several suggestions have been put forward.

What would happen if you had a "date factor" that you persisted
that was the *index-time* boost you applied to documents and
you kept increasing this factor every time period? Or boosted
each document by some factor based on the relevant date?

For instance, let's say I was indexing e-mails starting today.
All e-mails indexed today would get a boost (for all fields?) of 1.0.
Tomorrow, the boost would be 1.1, and the next day 1.2 etc.
Now, any search would automatically push more recent documents
toward the top. The operative word here is "tend" since it wouldn't
have the problem of sorting on dates, which ignores scores.....

I chose 1.0, 1.1, 1.2 at random, but you get the idea.

My main concern is that sometime you would have *very* large
factor differences and I don't know if you'd *ever* see really old
documents, but that's a danger no matter what you do. And
since I'm not even working on a lucene project now, I don't have
the time to try it <G>. Can you recognize a plea for having
others do the hard work when you see it?

And who knows, I may just be parroting something already suggested,
which means that it took this long to actually sink in...

Best
Erick

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message