lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rune Stilling <s...@rdfined.dk>
Subject Adding custom weights to individual terms
Date Thu, 13 Feb 2014 07:40:33 GMT
Hi list

I’m trying to figure out how customizable scoring and weighting is in the Lucene API. I
read about the API’s but still can’t figure out if the following is possible.

I would like to do normal document text indexing, but I would like to control the weight added
to tokens my self, also I would like to control the weighting of query tokens and the how
things are added together.

When indexing a word I would like attache my own weights to the word, and use these weights
when querying for documents. F.ex.

Doc 1
Lucene(0.7) is(0) a(0) powerful(0.9) indexing(0.62) and(0) search(0.99) API(0.3)

Doc 2
Lucene(0.5) is(0) used by(0) a(0) lot of(0) smart(0) people(0.1)

The floats in parentheses are some I would like to add in the indexing process, not something
coming from Lucene tdf/id ex.

Wen querying I would like to repeat this and also create the weights for each term “myself”
and control how the final doc score is calculated.

I have read that it’s possible to attach your own custom attributes to tokens. Is this the
way to go? Ie. should I add my custom weight as attributes to tokens, and then access these
attributes when calculating document score in the search process (described here https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/analysis/package-summary.html
under “adding a custom attribute”)?

The reason why I’m asking is that I can’t find any examples of this being done anywhere.
But I found someone stating “With Lucene, it is impossible to increase or decrease the weight
of individual terms in a document”.

With regards
Rune 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message