lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Prabawa" <j.prab...@gmail.com>
Subject Re: Position of matches to affect scoring
Date Wed, 20 Jun 2007 07:32:21 GMT
Oh I think I have found some clues at:
[1] http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967
[2]
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/package-summary.html#changingSimilarity

Thanks!

Jes

On 6/20/07, Jesse Prabawa <j.prabawa@gmail.com> wrote:
>
> Hi Steve,
>
> Thanks for the advice and your detailed explanation. I have another
> question though, I understand that Lucene normalizes the scores based on
> field length. Is there a way for me to avoid this? Or perhaps have a better
> control of how the scores are normalized.
>
> Best regards,
>
> Jes
>
> On 6/19/07, Steven Rowe <sarowe@syr.edu> wrote:
> >
> > Hi Jes,
> >
> > Jesse Prabawa wrote:
> > > The Lucene FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ
> > > mentions that the position of the matches in the text does not affect
> > > scoring. So is there anyway that I can make the position of the
> > > matches affect scoring? For example, I want matches that occur at the
> > > beginning to weigh more than those that occur elsewhere in the text.
> > > I have just started using Lucene so any help/advice is greatly
> > > appreciated :)
> >
> > One quick way to get (something like) what you want is to place "the
> > beginning" in a separate field from the rest of the document contents,
> > then query both the "beginning" and "remainder" fields with the same
> > query, boosting (i.e. weighting) the "beginning" field higher than the
> > "remainder" field.
> >
> > E.g. (assumes SimpleAnalyzer, and default "OR" QueryParser operator):
> >
> >   doc1: "This is the inception.  Here is the rest."
> >         "beginning" field: "this", "is", "the", "inception"
> >         "remainder" field: "here", "is", "the", "rest"
> >
> >   doc2: "Something else here.  After the inception."
> >         "beginning" field: "something", "else", "here"
> >         "remainder" field: "after", "the", "inception"
> >
> > query: "What does inception mean?"
> > -> "beginning:(what does inception mean)^5  remainder:(what does
> > inception mean)^1"
> >
> > The transformed query shown above is how it would look in QueryParser
> > syntax[1] to query both fields with the same query, while boosting the
> > "beginning" field higher (boost:5) than the "remainder" field (boost:1).
> >
> > You have to build this transformed query yourself - there is no facility
> >
> > in Lucene (that I'm aware of) for building multi-field queries with
> > differently boosted fields.
> >
> > Both docs will match, but doc1 will score higher than doc2, since
> > "inception" is in doc1's higher-weighted "beginning" field.
> >
> >
> > Steve
> >
> > [1] http://lucene.apache.org/java/docs/queryparsersyntax.html
> >
> > --
> > Steve Rowe
> > Center for Natural Language Processing
> > http://www.cnlp.org/tech/lucene.asp
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message