lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Runo <>
Subject Re: Optimizing & Improving results based on user feedback
Date Fri, 30 Jan 2009 15:56:54 GMT
I've thought about patching the QueryElevationComponent to apply  
boosts rather than a specific sort. Then the file might look like..

<query text="AAA"> <doc id="A" boost="5" /> <doc id="B" boost="4" /> </

And I could write a script that looks at click data once a day to fill  
out this file.
Thanks for your time!

Matthew Runo
Software Engineer, - 702-943-7833

On Jan 30, 2009, at 6:37 AM, Ryan McKinley wrote:

> It may not be as fine-grained as you want, but also check the  
> QueryElevationComponent.  This takes a preconfigured list of what  
> the top results should be for a given query and makes thoes  
> documents the top results.
> Presumably, you could use click logs to determine what the top  
> result should be.
> On Jan 29, 2009, at 7:45 PM, Walter Underwood wrote:
>> "A Decision Theoretic Framework for Ranking using Implicit Feedback"
>> uses clicks, but the best part of that paper is all the side comments
>> about difficulties in evaluation. For example, if someone clicks on
>> three results, is that three times as good or two failures and a
>> success? We have to know the information need to decide. That paper
>> is in the LR4IR 2008 proceedings.
>> Both Radlinski and Joachims seem to be focusing on click data.
>> I'm thinking of something much simpler, like taking the first
>> N hits and reordering those before returning. Brute force, but
>> would get most of the benefit. Usually, you only have reliable
>> click data for a small number of documents on each query, so
>> it is a waste of time to rerank the whole list. Besides, if you
>> need to move something up 100 places on the list, you should
>> probably be tuning your regular scoring rather than patching
>> it with click data.
>> wunder
>> On 1/29/09 3:43 PM, "Matthew Runo" <> wrote:
>>> Agreed, it seems that a lot of the algorithms in these papers would
>>> almost be a whole new RequestHandler ala Dismax. Luckily a lot of  
>>> them
>>> seem to be built on Lucene (at least the ones that I looked at that
>>> had code samples).
>>> Which papers did you see that actually talked about using clicks? I
>>> don't see those, beyond "Addressing Malicious Noise in Clickthrough
>>> Data" by Filip Radlinski and also his "Query Chains: Learning to  
>>> Rank
>>> from Implicit Feedback" - but neither is really on topic.
>>> Thanks for your time!
>>> Matthew Runo
>>> Software Engineer,
>>> - 702-943-7833
>>> On Jan 29, 2009, at 11:36 AM, Walter Underwood wrote:
>>>> Thanks, I didn't know there was so much research in this area.
>>>> Most of the papers at those workshops are about tuning the
>>>> entire ranking algorithm with machine learning techniques.
>>>> I am interested in adding one more feature, click data, to an
>>>> existing ranking algorithm. In my case, I have enough data to
>>>> use query-specific boosts instead of global document boosts.
>>>> We get about 2M search clicks per day from logged in users
>>>> (little or no click spam).
>>>> I'm checking out some papers from Thorsten Joachims and from
>>>> Microsoft Research that are specifically about clickthrough
>>>> feedback.
>>>> wunder
>>>> On 1/27/09 11:15 PM, "Neal Richter" <> wrote:
>>>>> OK I've implemented this before, written academic papers and  
>>>>> patents
>>>>> related to this task.
>>>>> Here are some hints:
>>>>> - you're on the right track with the editorial boosting elevators
>>>>> -
>>>>> - be darn careful about assuming that one click is enough evidence
>>>>> to boost a long
>>>>>   'distance'
>>>>> - first page effects in search will skew the learning badly if you
>>>>> don't compensate.
>>>>>      95% of users never go past the first page of results, 1% go
>>>>> past the second
>>>>>      page.  So perfectly good results on the second page get
>>>>> permanently locked out
>>>>> - consider forgetting what you learn under some condition
>>>>> In fact this whole area is called 'learning to rank' and is a hot
>>>>> research topic in IR.
>>>>> - Neal Richter
>>>>> On Tue, Jan 27, 2009 at 2:06 PM, Matthew Runo <>
>>>>> wrote:
>>>>>> Hello folks!
>>>>>> We've been thinking about ways to improve organic search results
>>>>>> for a while
>>>>>> (really, who hasn't?) and I'd like to get some ideas on ways to
>>>>>> implement a
>>>>>> feedback system that uses user behavior as input. Basically, it'd
>>>>>> work on
>>>>>> the premise that what the user actually clicked on is probably a
>>>>>> really good
>>>>>> match for their search, and should be boosted up in the results
>>>>>> for that
>>>>>> search.
>>>>>> For example, if I search for "rain boots", and really love the
>>>>>> 10th result
>>>>>> down (and show it by clicking on it), then we'd like to capture
>>>>>> this and use
>>>>>> the data to boost up that result //for that search//. We've
>>>>>> thought about
>>>>>> using index time boosts for the documents, but that'd boost it
>>>>>> regardless of
>>>>>> the search terms, which isn't what we want. We've thought about
>>>>>> using the
>>>>>> Elevator handler, but we don't really want to force a product to
>>>>>> the top -
>>>>>> we'd prefer it slowly rises over time as more and more people
>>>>>> click it from
>>>>>> the same search terms. Another way might be to stuff the keyword
>>>>>> into the
>>>>>> document, the more times it's in the document the higher it'd
>>>>>> score - but
>>>>>> there's gotta be a better way than that.
>>>>>> Obviously this can't be done 100% in solr - but if anyone had  
>>>>>> some
>>>>>> clever
>>>>>> ideas about how this might be possible it'd be interesting to  
>>>>>> hear
>>>>>> them.
>>>>>> Thanks for your time!
>>>>>> Matthew Runo
>>>>>> Software Engineer,
>>>>>> - 702-943-7833

View raw message