lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: Computing Relevancy Differently
Date Sat, 08 Feb 2003 23:36:16 GMT
Doug,

Can you give me an idea of what to replace the lengthNorm() method with to,
for example, remove any special weight given to shorter matching documents?
I can certainly go through a bunch of trial-and-error efforts, but it would
help if I had some grasp of the logic initially.

For example, from DefaultSimilarity, here's the lengthNorm() method:

  public float lengthNorm(String fieldName, int numTerms) {
    return (float)(1.0 / Math.sqrt(numTerms));
  }

Should I (for the purpose of eliminating any size bias) override it to
always return a 1?

How would I boost the headline field here? Is that how you are supposed to
use the (presently unused) fieldName parameter?  If that's the case, I
assume I would logically (to do what I'm trying to do) make this factor
greater than 1 for the 'headline' field, and 1 for all other fields?


Regards,

Terry

----- Original Message -----
From: "Doug Cutting" <cutting@lucene.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Friday, February 07, 2003 2:37 PM
Subject: Re: Computing Relevancy Differently


> Terry Steichen wrote:
> > I read all the relevant references I could find in the Users (not
> > Developers) list, and I still don't exactly know what to do.
> >
> > What I'd like to do is get a relevancy-based order in which (a) longer
> > documents tend to get more weight than shorter ones, (b) a document body
> > with 'X' instances of a query term gets a higher ranking than one with
fewer
> > than 'X' instances. and (c) a term found in the headline (usually in
> > addition to finding the same term in the body) is more highly ranked
than
> > one with the term only in the body.
>
> In the latest sources this can all be done by defining your own
> Similarity implementation.  You can make longer documents score higher
> by overriding the lengthNorm() method.  You can boost headlines there,
> or with Field.setBoost(), or at query time with Query.setBoost().
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message