lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <sar...@syr.edu>
Subject Re: Position of matches to affect scoring
Date Tue, 19 Jun 2007 14:49:57 GMT
Hi Jes,

Jesse Prabawa wrote:
> The Lucene FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ
> mentions that the position of the matches in the text does not affect
> scoring. So is there anyway that I can make the position of the
> matches affect scoring? For example, I want matches that occur at the
> beginning to weigh more than those that occur elsewhere in the text.
> I have just started using Lucene so any help/advice is greatly
> appreciated :)

One quick way to get (something like) what you want is to place "the
beginning" in a separate field from the rest of the document contents,
then query both the "beginning" and "remainder" fields with the same
query, boosting (i.e. weighting) the "beginning" field higher than the
"remainder" field.

E.g. (assumes SimpleAnalyzer, and default "OR" QueryParser operator):

  doc1: "This is the inception.  Here is the rest."
        "beginning" field: "this", "is", "the", "inception"
        "remainder" field: "here", "is", "the", "rest"

  doc2: "Something else here.  After the inception."
        "beginning" field: "something", "else", "here"
        "remainder" field: "after", "the", "inception"

query: "What does inception mean?"
-> "beginning:(what does inception mean)^5  remainder:(what does
inception mean)^1"

The transformed query shown above is how it would look in QueryParser
syntax[1] to query both fields with the same query, while boosting the
"beginning" field higher (boost:5) than the "remainder" field (boost:1).

You have to build this transformed query yourself - there is no facility
in Lucene (that I'm aware of) for building multi-field queries with
differently boosted fields.

Both docs will match, but doc1 will score higher than doc2, since
"inception" is in doc1's higher-weighted "beginning" field.


Steve

[1] http://lucene.apache.org/java/docs/queryparsersyntax.html

-- 
Steve Rowe
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message