lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rune Stilling <s...@rdfined.dk>
Subject Re: Adding custom weights to individual terms
Date Thu, 13 Feb 2014 18:49:16 GMT
I’m not sure how I would do that, when Lucene is meant to use my custom weights when calculating
document weights when executing a search query.

Doc 1
Lucene(0.7) is(0) a(0) powerful(0.9) indexing(0.62) and(0) search(0.99) API(0.3)

Doc 2
Lucene(0.5) is(0) used by(0) a(0) lot of(0) smart(0) people(0.1)

Query
Lucene

0.7 and 0.5 are my custom weight and should be used to return Doc 1 with weight 0.7 and Doc
2 with weight 0.5 as an answer to my query.

/Rune

Den 13/02/2014 kl. 13.27 skrev Shai Erera <serera@gmail.com>:

> I often prefer to manage such weights outside the index. Usually managing
> them inside the index leads to problems in the future when e.g the weights
> change. If they are encoded in the index, it means re-indexing. Also, if
> the weight changes then in some segments the weight will be different than
> others. I think that if you manage the weights e.g. in a simple FST (which
> is very compat), it will give you the best flexibility and it's very easy
> to use.
> 
> Shai
> 
> 
> On Thu, Feb 13, 2014 at 1:36 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
> 
>> You could stuff your custom weights into a payload, and index that,
>> but this is per term per document per position, while it sounds like
>> you just want one float for each term regardless of which
>> documents/positions where that term occurred?
>> 
>> Doing your own custom attribute would be a challenge: not only must
>> you create & set this attribute during indexing, but you then must
>> change the indexing process (custom chain, custom codec) to get the
>> new attribute into the index, and then make a custom query that can
>> pull this attribute at search time.
>> 
>> What are these term weights?  Are you sure you can't compute these
>> weights at search time with a custom similarity using the stats that
>> are already stored (docFreq, totalTermFreq, maxDoc, etc.)?
>> 
>> Mike McCandless
>> 
>> http://blog.mikemccandless.com
>> 
>> 
>> On Thu, Feb 13, 2014 at 2:40 AM, Rune Stilling <subs@rdfined.dk> wrote:
>>> Hi list
>>> 
>>> I'm trying to figure out how customizable scoring and weighting is in
>> the Lucene API. I read about the API's but still can't figure out if the
>> following is possible.
>>> 
>>> I would like to do normal document text indexing, but I would like to
>> control the weight added to tokens my self, also I would like to control
>> the weighting of query tokens and the how things are added together.
>>> 
>>> When indexing a word I would like attache my own weights to the word,
>> and use these weights when querying for documents. F.ex.
>>> 
>>> Doc 1
>>> Lucene(0.7) is(0) a(0) powerful(0.9) indexing(0.62) and(0) search(0.99)
>> API(0.3)
>>> 
>>> Doc 2
>>> Lucene(0.5) is(0) used by(0) a(0) lot of(0) smart(0) people(0.1)
>>> 
>>> The floats in parentheses are some I would like to add in the indexing
>> process, not something coming from Lucene tdf/id ex.
>>> 
>>> Wen querying I would like to repeat this and also create the weights for
>> each term "myself" and control how the final doc score is calculated.
>>> 
>>> I have read that it's possible to attach your own custom attributes to
>> tokens. Is this the way to go? Ie. should I add my custom weight as
>> attributes to tokens, and then access these attributes when calculating
>> document score in the search process (described here
>> https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/analysis/package-summary.htmlunder
"adding a custom attribute")?
>>> 
>>> The reason why I'm asking is that I can't find any examples of this
>> being done anywhere. But I found someone stating "With Lucene, it is
>> impossible to increase or decrease the weight of individual terms in a
>> document".
>>> 
>>> With regards
>>> Rune
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message