lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shailesh Kochhar" <shailesh.koch...@gmail.com>
Subject Re: Implementing new scoring algorithms in lucene
Date Tue, 21 Feb 2006 04:34:11 GMT
On 2/18/06, Paul Elschot <paul.elschot@xs4all.nl> wrote:
> On Saturday 18 February 2006 02:22, Shailesh Kochhar wrote:
> > Hi,
> >
> > I'm interested in implementing a few new scoring algorithms in Lucene
> > and I was wondering if anyone had attempted this in the past and how
> > successful they had been. If there are any resources that someone
> > could point me to that would be great, Googling and searching the
> > mailing-list archives didn't turn up anything.
> >
> > After looking over the current implementation of tf-idf scoring, I
> > concluded that the  weighting and scoring framework is mostly
> > implemented in TermQuery and TermScorer classes. I am thinking of
> > extending these classes and replacing a few others to implement the
> > new algorithm. Am I heading in the right direction? Does it make sense
> > to try and extend these classes or should I try building a parallel
> > heirarchy to do this?
>
> At the moment I only have time to answer with links:
>
> http://issues.apache.org/jira/browse/LUCENE-293
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200410.mbox/<200410172050.24372.paul.elschot%40xs4all.nl>
> http://www.loc.gov/standards/sru/cql/
> http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/surround/

I have a question about the sumOfSquaredWeigths method. As I
understand it, it computes the square of the idf for a given term that
is used to normalize the weight of individual terms in the query.

In implementing a different scoring algorithm, the query normalization
I use is different and the sumOfSquaredWeights method isn't needed.
However, it is being called from a number of different places that
makes it hard to remove. I could easily implement the calculation of
the qery normalization factor here, but the name of the method would
be very misleading.

Is there something I'm missing about this method, or is it a good
candidate for renaming to something broader? I feel that the entire
scoring framework has many components too tightly knit together that
make swapping a new algorithm in quite difficult. Ideally one should
only have to extend the Similarity, Query and Scorer classes.

Thoughts and comments?

  - Shailesh
Mime
View raw message