lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Trieschnigg, R.B. \(Dolf\)" <r.b.trieschn...@ewi.utwente.nl>
Subject RE: BM25 Similarity implementation
Date Fri, 17 Feb 2006 10:24:50 GMT
Sorry, the image wasn't sent:
http://wwwhome.cs.utwente.nl/~trieschn/bm25.PNG 

> -----Original Message-----
> From: Trieschnigg, R.B. (Dolf) 
> [mailto:r.b.trieschnigg@ewi.utwente.nl] 
> Sent: vrijdag 17 februari 2006 10:54
> To: java-user@lucene.apache.org
> Subject: RE: BM25 Similarity implementation
> 
> > > I would like to implement the Okapi BM25 weighting 
> function using my 
> > > own Similarity implementation. Unfortunately BM25 requires the 
> > > document length in the score calculation, which is not 
> provided by 
> > > the Scorer.
> > 
> > How do you want to measure document length?  If the number 
> of tokens 
> > is an acceptable measure, then the norm contains
> > sqrt(numTokens) by default.  You can modify your
> > Similarity.lengthNorm() implementation to not perform the sqrt, or 
> > square the norm.
> 
> I assume the number of tokens will be a good estimate.
> 
> I've included an image with the algorithm (my ASCII art isn't 
> that good).
> Legend of the figure:
> - k1, k3 and b are constants
> - tf is the within document term frequency
> - df is the document frequency
> - N is the collection size
> - r is the number of relevant documents containing a 
> particular term (without relevance information assumed to be 0)
> - R is the number of items known to be relevant to a specific 
> topic (without relevance information assumed to be 0)
> 
> As far is I understand Lucene multiplies the squared weight 
> with the result of Similarity.lengthNorm(), but BM25 requires 
> the document length for the calculation of the document term 
> weighting (as far as I know it's not possible to extract the 
> influence of the normalization as a constant multiplier).
> 
> Am I missing something here?
> 
> Dolf

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message