lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Prabawa" <j.prab...@gmail.com>
Subject Re: Position of matches to affect scoring
Date Wed, 20 Jun 2007 07:17:22 GMT
Hi Steve,

Thanks for the advice and your detailed explanation. I have another question
though, I understand that Lucene normalizes the scores based on field
length. Is there a way for me to avoid this? Or perhaps have a better
control of how the scores are normalized.

Best regards,

Jes

On 6/19/07, Steven Rowe <sarowe@syr.edu> wrote:
>
> Hi Jes,
>
> Jesse Prabawa wrote:
> > The Lucene FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ
> > mentions that the position of the matches in the text does not affect
> > scoring. So is there anyway that I can make the position of the
> > matches affect scoring? For example, I want matches that occur at the
> > beginning to weigh more than those that occur elsewhere in the text.
> > I have just started using Lucene so any help/advice is greatly
> > appreciated :)
>
> One quick way to get (something like) what you want is to place "the
> beginning" in a separate field from the rest of the document contents,
> then query both the "beginning" and "remainder" fields with the same
> query, boosting (i.e. weighting) the "beginning" field higher than the
> "remainder" field.
>
> E.g. (assumes SimpleAnalyzer, and default "OR" QueryParser operator):
>
>   doc1: "This is the inception.  Here is the rest."
>         "beginning" field: "this", "is", "the", "inception"
>         "remainder" field: "here", "is", "the", "rest"
>
>   doc2: "Something else here.  After the inception."
>         "beginning" field: "something", "else", "here"
>         "remainder" field: "after", "the", "inception"
>
> query: "What does inception mean?"
> -> "beginning:(what does inception mean)^5  remainder:(what does
> inception mean)^1"
>
> The transformed query shown above is how it would look in QueryParser
> syntax[1] to query both fields with the same query, while boosting the
> "beginning" field higher (boost:5) than the "remainder" field (boost:1).
>
> You have to build this transformed query yourself - there is no facility
> in Lucene (that I'm aware of) for building multi-field queries with
> differently boosted fields.
>
> Both docs will match, but doc1 will score higher than doc2, since
> "inception" is in doc1's higher-weighted "beginning" field.
>
>
> Steve
>
> [1] http://lucene.apache.org/java/docs/queryparsersyntax.html
>
> --
> Steve Rowe
> Center for Natural Language Processing
> http://www.cnlp.org/tech/lucene.asp
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message