lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T. H. Lin" <easy....@gmail.com>
Subject Re: can I set Boost to the term while indexing?
Date Thu, 20 Nov 2008 19:15:48 GMT
hi,

thanks for your suggestions.

actually, my original idea is that the same term may have different "weight"
in different doc.
of course the TF/IDF has already embedded some kind of term relavance to a
doc.
But I would like to explicitly set different "weight" to the same term in
diferent docs.

For instance,

the query is "T1 T2"

Both Doc1 and Doc2 have T1 and T2. They may also have exactly the same term
frequency!
But I want to bring some "semantic enhancement".
I want to let T1 has higher weight in Doc1 than in Doc2, and let T2 has
higher weight in Doc2 than Doc1.

I think, setBoost on a whole doc, on a field or on the term in query may not
achieve this.

Maybe "payload" is a solution, I will take a look!


Lin

2008/11/20 Grant Ingersoll <gsingers@apache.org>

> You can do this.  It's called adding a Payload.  You can add payloads
> during Analysis (Token.setPayload()) which means your code below will need
> to be changed below such that you use the Field constructor that takes in a
> TokenStream and wraps your input tokens.  This TokenStream will also need to
> add you payloads.
>
> then, during search, you can use a BoostingTermQuery to have the payload
> values factor in during scoring.
>
> -Grant
>
>
>

2008/11/20 Anshum <anshumg@gmail.com>

> Hi Lin,
>
> I guess you are looking at document boosting, if 'm right, you could
> conditionally do this:
> doc.setBoost(boostFactor);
> where boostFactor is a float > 1.0 that boosts the doc with the boost
> factor.
> Also, you could also use
> field.setBoost (boostValue) to boost a particular field in a document by a
> particular boostfactor.
> By default all boosts are set to 1.0 in lucene. The field.setBoost would
> multiply the score of all matching docs by this factor while calculating
> relevance.
>
> Hope this solves your issue.
>
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message