lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Diversifying Search Results - Custom Collector
Date Mon, 20 Aug 2012 21:05:57 GMT
If you do the same search twice in a row, the second search takes < 3
ms. Try finding your base result set and then augmenting it with a
second search within the first result set.

You can sort from a function call. Sorting is multi-level, so you can
make one of the levels random.

Does this app have to support paging the search list? If so, do you
plan to do a second search for the next 5 results?  Complex results
shuffling can make this hard. Also, I don't know exactly how random
works, whether it generates the same random order twice. This would
make paging impossible.

On Mon, Aug 20, 2012 at 1:52 PM, Karthick Duraisamy Soundararaj
<d.s.karthick@gmail.com> wrote:
> Hi Mikhail,
>                   You are correct.  "[+] show 6 result.."  will work but it
> wouldn't suit my requirements. This is a question of user experience right?
>
> Imagine if the product manager comes to you and says I dont want to see
> "[+] show 6 result.." and I want the results to be diverse but should be
> showed like any other search results.
>
> I think grouping does this by two pass collection. First pass, it figures
> out all the groups and then in the second  pass, it collects the results
> into these groups.
>
>
> Thanks,
> Karthick
>
> On Mon, Aug 20, 2012 at 3:24 PM, Mikhail Khludnev
> <mkhludnev@griddynamics.com> wrote:
>>
>> Hello,
>>
>> I don't believe your task can be solved by playing with scoring/collector
>> or shuffling.
>> For me it's absolutely Grouping usecase (despite I don't really know this
>> feature well).
>>
>> > Grouping cannot solve the problem because I dont want to limit the
>> > number of results showed based on the grouping field.
>>
>> I'm not really getting it. why you can set limit to 11 and just show the
>> labels like "[+] show 6 result.." or if you have 11 "[+] show more than 10
>> .."
>>
>> If you experience problem with constructing search result page, I can
>> suggest submit search request with rows=0&facet.field=BRAND, then your
>> algorithm can choose number of necessary items per every brand and submit
>> rows=X&fq=BRAND:Y it gives you arbitrarily sizes for "groups".
>>
>> Will this work for you?
>>
>>
>> On Mon, Aug 20, 2012 at 8:28 PM, Karthick Duraisamy Soundararaj
>> <d.s.karthick@gmail.com> wrote:
>>>
>>> Tanguy,
>>>               You idea is perfect for cases where there is a too many
>>> documents with 80-90% documents having same value for a particular field. As
>>> an example, your idea is ideal for, lets say we have 10 documents in total
>>> like this,
>>>
>>>  doc1 : <merchantName> Kellog's </merchantName>
>>>  doc2 : <merchantName> Kellog's </merchantName>
>>>  doc3 : <merchantName> Kellog's </merchantName>
>>>  doc4 : <merchantName> Kellog's </merchantName>
>>>  doc5 : <merchantName> Kellog's </merchantName>
>>>  doc6 : <merchantName> Kellog's </merchantName>
>>>  doc7 : <merchantName> Kellog's </merchantName>
>>>  doc8 : <merchantName> Nestle </merchantName>
>>>  doc9 : <merchantName> Kellog's </merchantName>
>>>  doc10 : <merchantName> Kellog's </merchantName>
>>>
>>> But I have
>>>  doc1 : <merchantName> Maggi </merchantName>
>>>  doc2 : <merchantName> Maggi  </merchantName>
>>>  doc3 : <merchantName> M&M's </merchantName>
>>>  doc4 : <merchantName> M&M's </merchantName>
>>>  doc5 : <merchantName> Hershey's </merchantName>
>>>  doc6 : <merchantName> Hershey's </merchantName>
>>>  doc7 : <merchantName> Nestle </merchantName>
>>>  doc8 : <merchantName> Nestle </merchantName>
>>>  doc9 : <merchantName> Kellog's </merchantName>
>>>  doc10 : <merchantName> Kellog's </merchantName>
>>>
>>>
>>> Thanks,
>>> Karthick
>>>
>>> On Mon, Aug 20, 2012 at 12:01 PM, Tanguy Moal <tanguy.moal@gmail.com>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I don't know if that could help, but if I understood your issue, you
>>>> have a lot of documents with the same or very close scores. Moreover I think
>>>> you get your matches in Merchant order (more or less) because they must be
>>>> indexed in that very same order, so solr returns documents of same scores
in
>>>> insertion order (although there is no contract specifying this)
>>>>
>>>> You could work around that issue by :
>>>> 1/ Turning off tf/idf because you're searching in documents with little
>>>> text where only the match counts, but frequencies obviously aren't helping.
>>>> 2/ Add a random number to each document at index time, and boost on that
>>>> random value at query time, this will shuffle your results, that's probably
>>>> the simplest thing to do.
>>>>
>>>> Hope this helps,
>>>>
>>>> Tanguy
>>>>
>>>> 2012/8/20 Karthick Duraisamy Soundararaj <d.s.karthick@gmail.com>
>>>>>
>>>>> Hello Mikhail,
>>>>>                         Thank you for the reply. In terms of user
>>>>> experience, I want to spread out the products from same brand farther
from
>>>>> each other, atleast in the first 50-100 results we display. I am thinking
>>>>> about two different approaches as solution.
>>>>>
>>>>>                       1. For first few results, display one top scoring
>>>>> product of a manufacturer  (For a given field, display the top scoring
>>>>> results of the unique field values for the first N matches) . This N
could
>>>>> be either a percentage relative to total matches or a configurable absolute
>>>>> value.
>>>>>                       2. Enforce a penalty on  the score for the
>>>>> results that have duplicate field values. The penalty can be enforced
such a
>>>>> way that, the results with higher scores will not be affected as against
the
>>>>> ones with lower score.
>>>>>
>>>>> Both of the solutions can be implemented while sorting the documents
>>>>> with TopFieldCollector / TopScoreDocCollector.
>>>>>
>>>>> Does this answer your question?  Please let me know if you have any
>>>>> more questions.
>>>>>
>>>>> Thanks,
>>>>> Karthick
>>>>>
>>>>> On Mon, Aug 20, 2012 at 3:26 AM, Mikhail Khludnev
>>>>> <mkhludnev@griddynamics.com> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I've got the problem description below. Can you explain the expected
>>>>>> user experience, and/or solution approach before diving into the
algorithm
>>>>>> design?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> On Sat, Aug 18, 2012 at 2:50 AM, Karthick Duraisamy Soundararaj
>>>>>> <karthick.soundararaj@gmail.com> wrote:
>>>>>>>
>>>>>>> My problem is that when there are a lot of documents representing
>>>>>>> products,
>>>>>>> products from same manufacturer seem to appear in close proximity
in
>>>>>>> the
>>>>>>> results and therefore, it doesnt provide brand diversity. When
you
>>>>>>> search
>>>>>>> for sofas, you get sofas from a manufacturer A dominating the
first
>>>>>>> page
>>>>>>> while the sofas from manufacturer B dominating the second page,
etc.
>>>>>>> The
>>>>>>> issue here is that a manufacturer tends to describes the different
>>>>>>> sofas he
>>>>>>> produces the same way and therefore there is a very little difference
>>>>>>> between the documents representing two sofas.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sincerely yours
>>>>>> Mikhail Khludnev
>>>>>> Tech Lead
>>>>>> Grid Dynamics
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Tech Lead
>> Grid Dynamics
>>
>>
>>
>
>



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message