lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tanguy Moal <tanguy.m...@gmail.com>
Subject Re: Diversifying Search Results - Custom Collector
Date Mon, 20 Aug 2012 16:01:53 GMT
Hello,

I don't know if that could help, but if I understood your issue, you have a
lot of documents with the same or very close scores. Moreover I think you
get your matches in Merchant order (more or less) because they must be
indexed in that very same order, so solr returns documents of same scores
in insertion order (although there is no contract specifying this)

You could work around that issue by :
1/ Turning off tf/idf because you're searching in documents with little
text where only the match counts, but frequencies obviously aren't helping.
2/ Add a random number to each document at index time, and boost on that
random value at query time, this will shuffle your results, that's probably
the simplest thing to do.

Hope this helps,

Tanguy

2012/8/20 Karthick Duraisamy Soundararaj <d.s.karthick@gmail.com>

> Hello Mikhail,
>                         Thank you for the reply. In terms of user
> experience, I want to spread out the products from same brand farther from
> each other, *atleast* in the first 50-100 results we display. I am
> thinking about two different approaches as solution.
>
>                       1. For first few results, display one top scoring
> product of a manufacturer  (For a given field, display the top scoring
> results of the unique field values for the first N matches) . This N could
> be either a percentage relative to total matches or a configurable absolute
> value.
>                       2. Enforce a penalty on  the score for the results
> that have duplicate field values. The penalty can be enforced such a way
> that, the results with higher scores will not be affected as against the
> ones with lower score.
>
> Both of the solutions can be implemented while sorting the documents with
> TopFieldCollector / TopScoreDocCollector.
>
> Does this answer your question?  Please let me know if you have any more
> questions.
>
> Thanks,
> Karthick
>
> On Mon, Aug 20, 2012 at 3:26 AM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
>> Hello,
>>
>> I've got the problem description below. Can you explain the expected user
>> experience, and/or solution approach before diving into the algorithm
>> design?
>>
>> Thanks
>>
>>
>> On Sat, Aug 18, 2012 at 2:50 AM, Karthick Duraisamy Soundararaj <
>> karthick.soundararaj@gmail.com> wrote:
>>
>>> My problem is that when there are a lot of documents representing
>>> products,
>>> products from same manufacturer seem to appear in close proximity in the
>>> results and therefore, it doesnt provide brand diversity. When you search
>>> for sofas, you get sofas from a manufacturer A dominating the first page
>>> while the sofas from manufacturer B dominating the second page, etc. The
>>> issue here is that a manufacturer tends to describes the different sofas
>>> he
>>> produces the same way and therefore there is a very little difference
>>> between the documents representing two sofas.
>>>
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Tech Lead
>> Grid Dynamics
>>
>> <http://www.griddynamics.com>
>>  <mkhludnev@griddynamics.com>
>>
>>
>
>

Mime
View raw message