Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 65841 invoked from network); 29 Jan 2009 19:36:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Jan 2009 19:36:43 -0000 Received: (qmail 35160 invoked by uid 500); 29 Jan 2009 19:36:39 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 35117 invoked by uid 500); 29 Jan 2009 19:36:39 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 35106 invoked by uid 99); 29 Jan 2009 19:36:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jan 2009 11:36:39 -0800 X-ASF-Spam-Status: No, hits=-1.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wunderwood@netflix.com designates 208.75.77.145 as permitted sender) Received: from [208.75.77.145] (HELO mx2.netflix.com) (208.75.77.145) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jan 2009 19:36:30 +0000 Received: from saferoom (exchangeav [10.64.32.97]) by mx2.netflix.com (8.12.11.20060308/8.12.11) with ESMTP id n0TJa8ET025225 for ; Thu, 29 Jan 2009 11:36:08 -0800 X-AuditID: 0a402061-00000c7c0000057c-1d-498205277ce5 Received: from message.netflix.com ([10.64.32.68]) by saferoom with Microsoft SMTPSVC(6.0.3790.3959); Thu, 29 Jan 2009 11:36:07 -0800 Received: from Superfly.netflix.com ([10.64.32.70]) by message.netflix.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 29 Jan 2009 11:36:07 -0800 Received: from 10.2.168.129 ([10.2.168.129]) by superfly.netflix.com ([10.64.32.70]) with Microsoft Exchange Server HTTP-DAV ; Thu, 29 Jan 2009 19:36:07 +0000 User-Agent: Microsoft-Entourage/12.15.0.081119 Date: Thu, 29 Jan 2009 11:36:04 -0800 Subject: Re: Optimizing & Improving results based on user feedback From: Walter Underwood To: Message-ID: Thread-Topic: Optimizing & Improving results based on user feedback Thread-Index: AcmCSNKkTJLxEPpuxUWWZp4yrKw8xw== In-Reply-To: <8c9f7ec80901272315v30affa87qb4f120abb4a67b03@mail.gmail.com> X-Face: 7Vqnb4fOVKsO)3JuUXKxR\M]:e"u'eG`Zue*.((7i7%P%rvZgS[j~95@C-s3i Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-OriginalArrivalTime: 29 Jan 2009 19:36:07.0053 (UTC) FILETIME=[D47603D0:01C98248] X-Brightmail-Tracker: AAAAAA== X-Virus-Checked: Checked by ClamAV on apache.org Thanks, I didn't know there was so much research in this area. Most of the papers at those workshops are about tuning the entire ranking algorithm with machine learning techniques. I am interested in adding one more feature, click data, to an existing ranking algorithm. In my case, I have enough data to use query-specific boosts instead of global document boosts. We get about 2M search clicks per day from logged in users (little or no click spam). I'm checking out some papers from Thorsten Joachims and from Microsoft Research that are specifically about clickthrough feedback. wunder On 1/27/09 11:15 PM, "Neal Richter" wrote: > OK I've implemented this before, written academic papers and patents > related to this task. > > Here are some hints: > - you're on the right track with the editorial boosting elevators > - http://wiki.apache.org/solr/UserTagDesign > - be darn careful about assuming that one click is enough evidence > to boost a long > 'distance' > - first page effects in search will skew the learning badly if you > don't compensate. > 95% of users never go past the first page of results, 1% go > past the second > page. So perfectly good results on the second page get > permanently locked out > - consider forgetting what you learn under some condition > > In fact this whole area is called 'learning to rank' and is a hot > research topic in IR. > http://web.mit.edu/shivani/www/Ranking-NIPS-05/ > http://research.microsoft.com/en-us/um/people/lr4ir-2007/ > https://research.microsoft.com/en-us/um/people/lr4ir-2008/ > > - Neal Richter > > > On Tue, Jan 27, 2009 at 2:06 PM, Matthew Runo wrote: >> Hello folks! >> >> We've been thinking about ways to improve organic search results for a while >> (really, who hasn't?) and I'd like to get some ideas on ways to implement a >> feedback system that uses user behavior as input. Basically, it'd work on >> the premise that what the user actually clicked on is probably a really good >> match for their search, and should be boosted up in the results for that >> search. >> >> For example, if I search for "rain boots", and really love the 10th result >> down (and show it by clicking on it), then we'd like to capture this and use >> the data to boost up that result //for that search//. We've thought about >> using index time boosts for the documents, but that'd boost it regardless of >> the search terms, which isn't what we want. We've thought about using the >> Elevator handler, but we don't really want to force a product to the top - >> we'd prefer it slowly rises over time as more and more people click it from >> the same search terms. Another way might be to stuff the keyword into the >> document, the more times it's in the document the higher it'd score - but >> there's gotta be a better way than that. >> >> Obviously this can't be done 100% in solr - but if anyone had some clever >> ideas about how this might be possible it'd be interesting to hear them. >> >> Thanks for your time! >> >> Matthew Runo >> Software Engineer, Zappos.com >> mruno@zappos.com - 702-943-7833 >> >>