lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <sar...@syr.edu>
Subject RE: boost freshness instead of sorting
Date Thu, 28 Aug 2008 17:06:05 GMT
Hi Yannis,

Hmm, hadn't thought about norms - you could just turn them off, right?:

<http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/document/Field.html#Field(java.lang.String,%20java.lang.String,%20org.apache.lucene.document.Field.Store,%20org.apache.lucene.document.Field.Index)>

with

<http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/document/Field.Index.html#NO_NORMS>

(Be careful of using NO_NORMS, though, since in addition to disabling norms, it also disables
analysis, so you'd have to add a same-named Field for each "1".)

or

<http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/document/Fieldable.html#setOmitNorms(boolean)>

Or I suppose you could know in advance the maximum number of tokens your field would have,
and use some alternate token to pad out the field (no norms is probably the more reasonable
route):

doc 0 has days: "0 0 0 1"
doc 1 has days: "0 0 1 1"
doc 2 has days: "0 1 1 1"
doc 3 has days: "1 1 1 1"

Steve

On 08/28/2008 at 12:44 PM, Yannis Pavlidis wrote:
> 
> Hey Steve,
> 
> Thanks for the quick response. Apologies my email was not
> very clear. I actually did what you and Andrzej propose. So
> in my test (with field boost and doc boost = 1)
> 
> doc 0 has days: "1"        and field weight = tf * idf *
> field Norm = sqrt(1) * idf * 1/sqrt(1) = idf
> doc 1 has days: "1 1"      and field weight = tf * idf *
> field Norm = sqrt(2) * idf * 1/sqrt(2) = idf
> doc 2 has days: "1 1 1"    and field weight = tf * idf *
> field Norm = sqrt(3) * idf * 1/sqrt(3) = idf
> doc 3 has days: "1 1 1 1"  and field weight = tf * idf *
> field Norm = sqrt(4) * idf * 1/sqrt(4) = idf
> 
> I am using the Snowball English analyzer which I believe does
> the right job (I also tried the same example with bbb instead of 1)
> 
> Any clarifications / suggestions would be appreciated.
> 
> Thanks,
> 
> Yannis.
> 
> -----Original Message-----
> From: Steven A Rowe [mailto:sarowe@syr.edu]
> Sent: Thu 8/28/2008 10:27 AM
> To: java-user@lucene.apache.org
> Subject: RE: boost freshness instead of sorting
> 
> Hi Yannis,
> 
> On 08/28/2008 at 12:12 PM, Yannis Pavlidis wrote:
> > I am trying to boost the freshness of some of our documents
> > in the index using the most efficient way (i.e. if 2 news
> > stories have the same score based on the content then I want
> > to promote the one that was created last)
> > 
> > [...]
> > 
> > While looking at the archives I came across this email:
> > http://www.gossamer-threads.com/lists/lucene/java-user/43457 where
> > "Andrzej Bialecki" proposes the addition of a column related with days
> > (months, etc) and add a "1" for each day/month that has passed from the
> > epoch. I tried his solution and it does not seem to performing that
> > well. The reason (unless my math have completely failed me) is because
> > the boost that this new field provides is always the same.
> 
> What Andrzej said was:
> 
> > Add a separate field, say "days", in which you will put as many
> > "1" as many days elapsed since the epoch (not neccessarily since
> > 1 Jan 1970 - pick a date that makes sense for you). Then, if you
> > want to prioritize newer documents, just add "+days:1" to your
> > query. Voila - the final results are a sum of other score factors
> > plus a score factor that is higher for more recent document,
> > containing more 1-s.
> 
> You interpreted this to mean "1", "2", "3", "4", etc. for the
> field value, but what Andrzej meant was something like
> (assuming an analyzer that tokenizes at whitespace and does
> not drop numeric tokens): "1", "1 1", "1 1 1", "1 1 1 1",
> etc.  Note that the choice of the "1" for the token string is
> arbitrary - it could also be "X" or "Gazornumplatz".
> 
> Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message