lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ralf Bierig <>
Subject Re: Indexing Weighted Tags per Document
Date Tue, 28 Oct 2014 09:07:08 GMT
The second solution sounds great and a lot more natural than payloads.

I know how to overwrite the Similarity class but this one would only be 
called at search time and then already use the existing term frequency. 
Looking up the probabilities every time a search is performed is 
probably also not performing well. So, I suspect I would somehow need to 
find a way to store the term frequency directly into the index at the 
time when I am indexing documents. Is that correct?

Do you have a code sniplet that would highlight that part of your 
elegant solution?

Thanks in advance,

On 28.10.2014 09:31, Ramkumar R. Aiyengar wrote:
> There are a few approaches possible here, we had a similar use case and
> went for the second one below. I primarily deal with Solr, so I don't know
> of Lucene-only examples, but hopefully you can dig this up..
> (1) You can attach payloads to each occurrence of the tag, and modify the
> scoring to use the payload..
> (2) Use term frequency as a proxy. You could scale the probability by a
> factor and introduce the term as many times as the scaled value
> (essentially making it the term frequency). Scoring will know account for
> this. Advantage is that you also can achieve score normalisation with
> keywords and amongst tags, and you can also filter results by probability.
> (3) There potentially is also a solution using child documents and block
> join, but I may be mistaken, haven't explored this a lot..
>   On 27 Oct 2014 16:10, "Ralf Bierig" <> wrote:
>> I want to index documents together with a list of tags (usually between
>> 10-30) that represent meta information about this document. Normally, i
>> would create an extra field "tag" store every tag, by its name, inside that
>> field and create my 10-30 fields that and adding it to the document before
>> adding the document to the index and writing the index.
>> However, I have the following extra requirements:
>> a) I need to have a weight in the range of [0,1] being associated with the
>> tag that represents the probability of this tag being true.
>> b) These tags must be associated with the document and not with the terms
>> of the document.
>> c) I must be able to associate many tags to a document instance.
>> d) I must be able to use the weight in the weighting process of the search
>> engine.
>> e) The weight must be for the document instance, as the weight represents
>> the probability for that tag for that particular document. E.g.
>> fieldname: tag
>> fieldvalue: tree
>> fieldweight: 0.8
>> meaning that this particular document is with a probability of 0.8 about
>> trees.
>> What is the best way to do that?
>> Can somebody point me to an example or something quite similar that
>> captures such a problem?
>> Best,
>> Ralf
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message