lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan McKinley <ryan...@gmail.com>
Subject Re: Optimizing & Improving results based on user feedback
Date Fri, 30 Jan 2009 14:37:21 GMT
It may not be as fine-grained as you want, but also check the  
QueryElevationComponent.  This takes a preconfigured list of what the  
top results should be for a given query and makes thoes documents the  
top results.

Presumably, you could use click logs to determine what the top result  
should be.


On Jan 29, 2009, at 7:45 PM, Walter Underwood wrote:

> "A Decision Theoretic Framework for Ranking using Implicit Feedback"
> uses clicks, but the best part of that paper is all the side comments
> about difficulties in evaluation. For example, if someone clicks on
> three results, is that three times as good or two failures and a
> success? We have to know the information need to decide. That paper
> is in the LR4IR 2008 proceedings.
>
> Both Radlinski and Joachims seem to be focusing on click data.
>
> I'm thinking of something much simpler, like taking the first
> N hits and reordering those before returning. Brute force, but
> would get most of the benefit. Usually, you only have reliable
> click data for a small number of documents on each query, so
> it is a waste of time to rerank the whole list. Besides, if you
> need to move something up 100 places on the list, you should
> probably be tuning your regular scoring rather than patching
> it with click data.
>
> wunder
>
> On 1/29/09 3:43 PM, "Matthew Runo" <mruno@zappos.com> wrote:
>
>> Agreed, it seems that a lot of the algorithms in these papers would
>> almost be a whole new RequestHandler ala Dismax. Luckily a lot of  
>> them
>> seem to be built on Lucene (at least the ones that I looked at that
>> had code samples).
>>
>> Which papers did you see that actually talked about using clicks? I
>> don't see those, beyond "Addressing Malicious Noise in Clickthrough
>> Data" by Filip Radlinski and also his "Query Chains: Learning to Rank
>> from Implicit Feedback" - but neither is really on topic.
>>
>> Thanks for your time!
>>
>> Matthew Runo
>> Software Engineer, Zappos.com
>> mruno@zappos.com - 702-943-7833
>>
>> On Jan 29, 2009, at 11:36 AM, Walter Underwood wrote:
>>
>>> Thanks, I didn't know there was so much research in this area.
>>> Most of the papers at those workshops are about tuning the
>>> entire ranking algorithm with machine learning techniques.
>>>
>>> I am interested in adding one more feature, click data, to an
>>> existing ranking algorithm. In my case, I have enough data to
>>> use query-specific boosts instead of global document boosts.
>>> We get about 2M search clicks per day from logged in users
>>> (little or no click spam).
>>>
>>> I'm checking out some papers from Thorsten Joachims and from
>>> Microsoft Research that are specifically about clickthrough
>>> feedback.
>>>
>>> wunder
>>>
>>> On 1/27/09 11:15 PM, "Neal Richter" <nrichter@gmail.com> wrote:
>>>
>>>> OK I've implemented this before, written academic papers and  
>>>> patents
>>>> related to this task.
>>>>
>>>> Here are some hints:
>>>>  - you're on the right track with the editorial boosting elevators
>>>>  - http://wiki.apache.org/solr/UserTagDesign
>>>>  - be darn careful about assuming that one click is enough evidence
>>>> to boost a long
>>>>    'distance'
>>>>  - first page effects in search will skew the learning badly if you
>>>> don't compensate.
>>>>       95% of users never go past the first page of results, 1% go
>>>> past the second
>>>>       page.  So perfectly good results on the second page get
>>>> permanently locked out
>>>>  - consider forgetting what you learn under some condition
>>>>
>>>> In fact this whole area is called 'learning to rank' and is a hot
>>>> research topic in IR.
>>>> http://web.mit.edu/shivani/www/Ranking-NIPS-05/
>>>> http://research.microsoft.com/en-us/um/people/lr4ir-2007/
>>>> https://research.microsoft.com/en-us/um/people/lr4ir-2008/
>>>>
>>>> - Neal Richter
>>>>
>>>>
>>>> On Tue, Jan 27, 2009 at 2:06 PM, Matthew Runo <mruno@zappos.com>
>>>> wrote:
>>>>> Hello folks!
>>>>>
>>>>> We've been thinking about ways to improve organic search results
>>>>> for a while
>>>>> (really, who hasn't?) and I'd like to get some ideas on ways to
>>>>> implement a
>>>>> feedback system that uses user behavior as input. Basically, it'd
>>>>> work on
>>>>> the premise that what the user actually clicked on is probably a
>>>>> really good
>>>>> match for their search, and should be boosted up in the results
>>>>> for that
>>>>> search.
>>>>>
>>>>> For example, if I search for "rain boots", and really love the
>>>>> 10th result
>>>>> down (and show it by clicking on it), then we'd like to capture
>>>>> this and use
>>>>> the data to boost up that result //for that search//. We've
>>>>> thought about
>>>>> using index time boosts for the documents, but that'd boost it
>>>>> regardless of
>>>>> the search terms, which isn't what we want. We've thought about
>>>>> using the
>>>>> Elevator handler, but we don't really want to force a product to
>>>>> the top -
>>>>> we'd prefer it slowly rises over time as more and more people
>>>>> click it from
>>>>> the same search terms. Another way might be to stuff the keyword
>>>>> into the
>>>>> document, the more times it's in the document the higher it'd
>>>>> score - but
>>>>> there's gotta be a better way than that.
>>>>>
>>>>> Obviously this can't be done 100% in solr - but if anyone had some
>>>>> clever
>>>>> ideas about how this might be possible it'd be interesting to hear
>>>>> them.
>>>>>
>>>>> Thanks for your time!
>>>>>
>>>>> Matthew Runo
>>>>> Software Engineer, Zappos.com
>>>>> mruno@zappos.com - 702-943-7833
>>>>>
>>>>>
>>>
>>
>


Mime
View raw message