lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Adding custom weights to individual terms
Date Thu, 13 Feb 2014 11:36:00 GMT
You could stuff your custom weights into a payload, and index that,
but this is per term per document per position, while it sounds like
you just want one float for each term regardless of which
documents/positions where that term occurred?

Doing your own custom attribute would be a challenge: not only must
you create & set this attribute during indexing, but you then must
change the indexing process (custom chain, custom codec) to get the
new attribute into the index, and then make a custom query that can
pull this attribute at search time.

What are these term weights?  Are you sure you can't compute these
weights at search time with a custom similarity using the stats that
are already stored (docFreq, totalTermFreq, maxDoc, etc.)?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Feb 13, 2014 at 2:40 AM, Rune Stilling <subs@rdfined.dk> wrote:
> Hi list
>
> I'm trying to figure out how customizable scoring and weighting is in the Lucene API.
I read about the API's but still can't figure out if the following is possible.
>
> I would like to do normal document text indexing, but I would like to control the weight
added to tokens my self, also I would like to control the weighting of query tokens and the
how things are added together.
>
> When indexing a word I would like attache my own weights to the word, and use these weights
when querying for documents. F.ex.
>
> Doc 1
> Lucene(0.7) is(0) a(0) powerful(0.9) indexing(0.62) and(0) search(0.99) API(0.3)
>
> Doc 2
> Lucene(0.5) is(0) used by(0) a(0) lot of(0) smart(0) people(0.1)
>
> The floats in parentheses are some I would like to add in the indexing process, not something
coming from Lucene tdf/id ex.
>
> Wen querying I would like to repeat this and also create the weights for each term "myself"
and control how the final doc score is calculated.
>
> I have read that it's possible to attach your own custom attributes to tokens. Is this
the way to go? Ie. should I add my custom weight as attributes to tokens, and then access
these attributes when calculating document score in the search process (described here https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/analysis/package-summary.html
under "adding a custom attribute")?
>
> The reason why I'm asking is that I can't find any examples of this being done anywhere.
But I found someone stating "With Lucene, it is impossible to increase or decrease the weight
of individual terms in a document".
>
> With regards
> Rune

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message