lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Adding custom weights to individual terms
Date Thu, 13 Feb 2014 12:27:42 GMT
I often prefer to manage such weights outside the index. Usually managing
them inside the index leads to problems in the future when e.g the weights
change. If they are encoded in the index, it means re-indexing. Also, if
the weight changes then in some segments the weight will be different than
others. I think that if you manage the weights e.g. in a simple FST (which
is very compat), it will give you the best flexibility and it's very easy
to use.

Shai


On Thu, Feb 13, 2014 at 1:36 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> You could stuff your custom weights into a payload, and index that,
> but this is per term per document per position, while it sounds like
> you just want one float for each term regardless of which
> documents/positions where that term occurred?
>
> Doing your own custom attribute would be a challenge: not only must
> you create & set this attribute during indexing, but you then must
> change the indexing process (custom chain, custom codec) to get the
> new attribute into the index, and then make a custom query that can
> pull this attribute at search time.
>
> What are these term weights?  Are you sure you can't compute these
> weights at search time with a custom similarity using the stats that
> are already stored (docFreq, totalTermFreq, maxDoc, etc.)?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Feb 13, 2014 at 2:40 AM, Rune Stilling <subs@rdfined.dk> wrote:
> > Hi list
> >
> > I'm trying to figure out how customizable scoring and weighting is in
> the Lucene API. I read about the API's but still can't figure out if the
> following is possible.
> >
> > I would like to do normal document text indexing, but I would like to
> control the weight added to tokens my self, also I would like to control
> the weighting of query tokens and the how things are added together.
> >
> > When indexing a word I would like attache my own weights to the word,
> and use these weights when querying for documents. F.ex.
> >
> > Doc 1
> > Lucene(0.7) is(0) a(0) powerful(0.9) indexing(0.62) and(0) search(0.99)
> API(0.3)
> >
> > Doc 2
> > Lucene(0.5) is(0) used by(0) a(0) lot of(0) smart(0) people(0.1)
> >
> > The floats in parentheses are some I would like to add in the indexing
> process, not something coming from Lucene tdf/id ex.
> >
> > Wen querying I would like to repeat this and also create the weights for
> each term "myself" and control how the final doc score is calculated.
> >
> > I have read that it's possible to attach your own custom attributes to
> tokens. Is this the way to go? Ie. should I add my custom weight as
> attributes to tokens, and then access these attributes when calculating
> document score in the search process (described here
> https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/analysis/package-summary.htmlunder
"adding a custom attribute")?
> >
> > The reason why I'm asking is that I can't find any examples of this
> being done anywhere. But I found someone stating "With Lucene, it is
> impossible to increase or decrease the weight of individual terms in a
> document".
> >
> > With regards
> > Rune
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message