lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Liu" <andyliu1...@gmail.com>
Subject Re: Lopsided scores for each term in BooleanQuery
Date Tue, 19 Sep 2006 01:47:10 GMT
In our application we have multiple fields that are searched.  So fast car
becomes:

+(field1:fast field2:fast field3:fast) +(field1:car field2:car field3:car)

I understand that the default sqrt implementation of tf() would help the
"lopsided score" phenomenon with searches within the same field.  But when
searching in multiple fields, this effect is obscured since each matching
field adds to the score of that clause.  Is there a way to "peek" at the
scores of each clause, and adjust based on how divergent the scores are?  Or
is there an easier way to do this that I'm just not seeing?

Andy

On 9/18/06, Paul Elschot <paul.elschot@xs4all.nl> wrote:
>
> On Monday 18 September 2006 23:08, Andy Liu wrote:
> > For multi-word queries, I would like to reward documents that contain a
> more
> > even distribution of each word and penalize documents that have a skewed
> > distribution.  For example, if my search query is:
> >
> > +content:fast +content:car
> >
> > I would prefer a document that contains each word an equal number of
> times
> > over a document that contains the word "fast" 100 times and the word
> "car" 1
> > time.  In other words, I would like to compare the scores of each
> > BooleanQuery term and adjust the score according to the distribution.
> >
> > Can somebody point me in the right direction as to how I would implement
> > this?
>
> It's already there in DefaultSimilarity.tf() which is the square root:
>
> (sqrt(1) + sqrt(1)) > (sqrt(0) + sqrt(2))
>
>
> Regards,
> Paul Elschot
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message