lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthick Duraisamy Soundararaj <d.s.karth...@gmail.com>
Subject Re: Diversifying Search Results - Custom Collector
Date Mon, 20 Aug 2012 13:44:07 GMT
Hello Mikhail,
                        Thank you for the reply. In terms of user
experience, I want to spread out the products from same brand farther from
each other, *atleast* in the first 50-100 results we display. I am thinking
about two different approaches as solution.

                      1. For first few results, display one top scoring
product of a manufacturer  (For a given field, display the top scoring
results of the unique field values for the first N matches) . This N could
be either a percentage relative to total matches or a configurable absolute
value.
                      2. Enforce a penalty on  the score for the results
that have duplicate field values. The penalty can be enforced such a way
that, the results with higher scores will not be affected as against the
ones with lower score.

Both of the solutions can be implemented while sorting the documents with
TopFieldCollector / TopScoreDocCollector.

Does this answer your question?  Please let me know if you have any more
questions.

Thanks,
Karthick

On Mon, Aug 20, 2012 at 3:26 AM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Hello,
>
> I've got the problem description below. Can you explain the expected user
> experience, and/or solution approach before diving into the algorithm
> design?
>
> Thanks
>
>
> On Sat, Aug 18, 2012 at 2:50 AM, Karthick Duraisamy Soundararaj <
> karthick.soundararaj@gmail.com> wrote:
>
>> My problem is that when there are a lot of documents representing
>> products,
>> products from same manufacturer seem to appear in close proximity in the
>> results and therefore, it doesnt provide brand diversity. When you search
>> for sofas, you get sofas from a manufacturer A dominating the first page
>> while the sofas from manufacturer B dominating the second page, etc. The
>> issue here is that a manufacturer tends to describes the different sofas
>> he
>> produces the same way and therefore there is a very little difference
>> between the documents representing two sofas.
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhludnev@griddynamics.com>
>
>

Mime
View raw message