lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan McKinley <ryan...@gmail.com>
Subject Re: Optimizing & Improving results based on user feedback
Date Fri, 30 Jan 2009 16:25:08 GMT
yes, applying a boost would be a good addition.

patches are always welcome ;)


On Jan 30, 2009, at 10:56 AM, Matthew Runo wrote:

> I've thought about patching the QueryElevationComponent to apply  
> boosts rather than a specific sort. Then the file might look like..
>
> <query text="AAA"> <doc id="A" boost="5" /> <doc id="B" boost="4" />
 
> </query>
> And I could write a script that looks at click data once a day to  
> fill out this file.
> Thanks for your time!
>
> Matthew Runo
> Software Engineer, Zappos.com
> mruno@zappos.com - 702-943-7833
>
> On Jan 30, 2009, at 6:37 AM, Ryan McKinley wrote:
>
>> It may not be as fine-grained as you want, but also check the  
>> QueryElevationComponent.  This takes a preconfigured list of what  
>> the top results should be for a given query and makes thoes  
>> documents the top results.
>>
>> Presumably, you could use click logs to determine what the top  
>> result should be.
>>
>>
>> On Jan 29, 2009, at 7:45 PM, Walter Underwood wrote:
>>
>>> "A Decision Theoretic Framework for Ranking using Implicit Feedback"
>>> uses clicks, but the best part of that paper is all the side  
>>> comments
>>> about difficulties in evaluation. For example, if someone clicks on
>>> three results, is that three times as good or two failures and a
>>> success? We have to know the information need to decide. That paper
>>> is in the LR4IR 2008 proceedings.
>>>
>>> Both Radlinski and Joachims seem to be focusing on click data.
>>>
>>> I'm thinking of something much simpler, like taking the first
>>> N hits and reordering those before returning. Brute force, but
>>> would get most of the benefit. Usually, you only have reliable
>>> click data for a small number of documents on each query, so
>>> it is a waste of time to rerank the whole list. Besides, if you
>>> need to move something up 100 places on the list, you should
>>> probably be tuning your regular scoring rather than patching
>>> it with click data.
>>>
>>> wunder
>>>
>>> On 1/29/09 3:43 PM, "Matthew Runo" <mruno@zappos.com> wrote:
>>>
>>>> Agreed, it seems that a lot of the algorithms in these papers would
>>>> almost be a whole new RequestHandler ala Dismax. Luckily a lot of  
>>>> them
>>>> seem to be built on Lucene (at least the ones that I looked at that
>>>> had code samples).
>>>>
>>>> Which papers did you see that actually talked about using clicks? I
>>>> don't see those, beyond "Addressing Malicious Noise in Clickthrough
>>>> Data" by Filip Radlinski and also his "Query Chains: Learning to  
>>>> Rank
>>>> from Implicit Feedback" - but neither is really on topic.
>>>>
>>>> Thanks for your time!
>>>>
>>>> Matthew Runo
>>>> Software Engineer, Zappos.com
>>>> mruno@zappos.com - 702-943-7833
>>>>
>>>> On Jan 29, 2009, at 11:36 AM, Walter Underwood wrote:
>>>>
>>>>> Thanks, I didn't know there was so much research in this area.
>>>>> Most of the papers at those workshops are about tuning the
>>>>> entire ranking algorithm with machine learning techniques.
>>>>>
>>>>> I am interested in adding one more feature, click data, to an
>>>>> existing ranking algorithm. In my case, I have enough data to
>>>>> use query-specific boosts instead of global document boosts.
>>>>> We get about 2M search clicks per day from logged in users
>>>>> (little or no click spam).
>>>>>
>>>>> I'm checking out some papers from Thorsten Joachims and from
>>>>> Microsoft Research that are specifically about clickthrough
>>>>> feedback.
>>>>>
>>>>> wunder
>>>>>
>>>>> On 1/27/09 11:15 PM, "Neal Richter" <nrichter@gmail.com> wrote:
>>>>>
>>>>>> OK I've implemented this before, written academic papers and  
>>>>>> patents
>>>>>> related to this task.
>>>>>>
>>>>>> Here are some hints:
>>>>>> - you're on the right track with the editorial boosting elevators
>>>>>> - http://wiki.apache.org/solr/UserTagDesign
>>>>>> - be darn careful about assuming that one click is enough  
>>>>>> evidence
>>>>>> to boost a long
>>>>>>  'distance'
>>>>>> - first page effects in search will skew the learning badly if  
>>>>>> you
>>>>>> don't compensate.
>>>>>>     95% of users never go past the first page of results, 1% go
>>>>>> past the second
>>>>>>     page.  So perfectly good results on the second page get
>>>>>> permanently locked out
>>>>>> - consider forgetting what you learn under some condition
>>>>>>
>>>>>> In fact this whole area is called 'learning to rank' and is a hot
>>>>>> research topic in IR.
>>>>>> http://web.mit.edu/shivani/www/Ranking-NIPS-05/
>>>>>> http://research.microsoft.com/en-us/um/people/lr4ir-2007/
>>>>>> https://research.microsoft.com/en-us/um/people/lr4ir-2008/
>>>>>>
>>>>>> - Neal Richter
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 27, 2009 at 2:06 PM, Matthew Runo <mruno@zappos.com>
>>>>>> wrote:
>>>>>>> Hello folks!
>>>>>>>
>>>>>>> We've been thinking about ways to improve organic search results
>>>>>>> for a while
>>>>>>> (really, who hasn't?) and I'd like to get some ideas on ways
to
>>>>>>> implement a
>>>>>>> feedback system that uses user behavior as input. Basically,
 
>>>>>>> it'd
>>>>>>> work on
>>>>>>> the premise that what the user actually clicked on is probably
a
>>>>>>> really good
>>>>>>> match for their search, and should be boosted up in the results
>>>>>>> for that
>>>>>>> search.
>>>>>>>
>>>>>>> For example, if I search for "rain boots", and really love the
>>>>>>> 10th result
>>>>>>> down (and show it by clicking on it), then we'd like to capture
>>>>>>> this and use
>>>>>>> the data to boost up that result //for that search//. We've
>>>>>>> thought about
>>>>>>> using index time boosts for the documents, but that'd boost it
>>>>>>> regardless of
>>>>>>> the search terms, which isn't what we want. We've thought about
>>>>>>> using the
>>>>>>> Elevator handler, but we don't really want to force a product
to
>>>>>>> the top -
>>>>>>> we'd prefer it slowly rises over time as more and more people
>>>>>>> click it from
>>>>>>> the same search terms. Another way might be to stuff the keyword
>>>>>>> into the
>>>>>>> document, the more times it's in the document the higher it'd
>>>>>>> score - but
>>>>>>> there's gotta be a better way than that.
>>>>>>>
>>>>>>> Obviously this can't be done 100% in solr - but if anyone had
 
>>>>>>> some
>>>>>>> clever
>>>>>>> ideas about how this might be possible it'd be interesting to
 
>>>>>>> hear
>>>>>>> them.
>>>>>>>
>>>>>>> Thanks for your time!
>>>>>>>
>>>>>>> Matthew Runo
>>>>>>> Software Engineer, Zappos.com
>>>>>>> mruno@zappos.com - 702-943-7833
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>


Mime
View raw message