lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Jungwirth ...@illuminatedcomputing.com>
Subject Re: Search Documents by Scored Tags?
Date Tue, 30 Oct 2012 21:19:02 GMT
As an alternative approach, I've been looking into attaching a
float/int Payload to each Term, and then providing a custom Similarity
class so that Terms with a higher Payload score do better in the
query. There are several tutorials online about this, and I think I've
got it worked out. But I have one question: the docs say that if I
override Similarity, I must provide it to IndexWriter, not just the
Searcher. This article does the same:

http://edwarddrapkin.com/2011/04/14/an-introduction-to-lucene-payloads/

But if my custom Payload just extends DefaultPayload and only
overrides scorePayload, I don't see why IndexWriter needs it, because
nothing from that method is written into the index. Or am I mistaken
about that?

Thanks,
Paul


On Thu, Oct 25, 2012 at 5:18 PM, Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
> Okay, thanks! This is my first time using Lucene, and what I want to
> do with it seems just slightly off the beaten path, so I'm glad to get
> some confirmation from an expert.
>
> Yours,
> Paul
>
>
> On Thu, Oct 25, 2012 at 11:51 AM, Upayavira <uv@odoko.co.uk> wrote:
>> This would seem a pretty reasonable way to go. It would just require
>> that you know the boost for each category at indexing time, and would
>> likely require some experimentation to identify the best boosts for each
>> of your categories.
>>
>> Other than that, it seems perfectly reasonable to me.
>>
>> Upayavira
>>
>> On Thu, Oct 25, 2012, at 07:18 PM, Paul Jungwirth wrote:
>>> Thank you for your help! Just to be clear: I wasn't asking for the
>>> syntax, but I was wondering if in your judgment this approach is
>>> appropriate. Will it give sensible results? Are there drawbacks in
>>> performance, flexibility, etc.? Is there an better way to do it?
>>>
>>> Thanks,
>>> Paul
>>>
>>>
>>> On Thu, Oct 25, 2012 at 11:13 AM, Upayavira <uv@odoko.co.uk> wrote:
>>> > In Solr syntax:
>>> > <field name="category" boost="8">entertainment</field>
>>> > <field name="category" boost="4">tv</field>
>>> > <field name="category" boost="20">sports</field>
>>> > <field name="category" boost="5">entertainment</field>
>>> >
>>> > That way: category(football tv) would do as you require, and would boost
>>> > football above TV.
>>> >
>>> > That is - use index time boosts on your fields when you add them.
>>> >
>>> > Upayavira
>>> >
>>> > On Thu, Oct 25, 2012, at 06:16 PM, Paul Jungwirth wrote:
>>> >> Hello,
>>> >>
>>> >> I have documents with various tags, and each tag has a numeric score,
>>> >> so one document might be tagged "sports:20, entertainment:5,
>>> >> football:10", and another "entertainment:8, tv:4". I'd like to let
>>> >> people search by one or more tags, e.g. "football tv", and have the
>>> >> results sorted with higher-scored tags first. I thought I could do
>>> >> this by adding a separate Field for each tag (all named "tag" or
>>> >> whatever), and then boosting the fields according to their score. Does
>>> >> that seem like a good approach, or is there some cleaner way? I've
>>> >> been reading the Lucene in Action book and looking through the online
>>> >> docs, but I haven't found this usage scenario anywhere.
>>> >>
>>> >> Thanks,
>>> >> Paul
>>> >>
>>> >> --
>>> >> _________________________________
>>> >> Pulchritudo splendor veritatis.
>>>
>>>
>>>
>>> --
>>> _________________________________
>>> Pulchritudo splendor veritatis.
>
>
>
> --
> _________________________________
> Pulchritudo splendor veritatis.



-- 
_________________________________
Pulchritudo splendor veritatis.

Mime
View raw message