Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 36420 invoked from network); 5 May 2010 14:10:41 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 May 2010 14:10:41 -0000 Received: (qmail 85194 invoked by uid 500); 5 May 2010 14:10:39 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 85156 invoked by uid 500); 5 May 2010 14:10:39 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 85147 invoked by uid 99); 5 May 2010 14:10:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 14:10:39 +0000 X-ASF-Spam-Status: No, hits=-1.2 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gsiasf@gmail.com designates 209.85.221.171 as permitted sender) Received: from [209.85.221.171] (HELO mail-qy0-f171.google.com) (209.85.221.171) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 14:10:32 +0000 Received: by qyk1 with SMTP id 1so777257qyk.5 for ; Wed, 05 May 2010 07:10:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=y91TcHZQCp/WAdGYQTQd7z5p1/FSdqq1Qgdz428plQc=; b=HsXhZb72qGgjZ2cjs/Kd3CMRs2hqs/VxV8Z3ozAg6PdmA4e8BAJ84mkRzjA+vvoCEY Doi6fgZ1Q3WGZhSksCFWTe3exBauGhXqLZJ8ceEJtFEEclY0pme3rGPgt/j7+0k6vmlA tBV7vPo7yymD4NuRKwHs3BLjLOO3YxEjS3v1c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=I5phnOEugm65B++SciZRYbKB9GYefNNnVFJRqpVtCDOOZPuLqO2DaiI7T0ojy3baLM 6obSLmPxocrffXFiKXyr6oVuwFN8uF/ijdMC//bsoPakTUgoI8OfpH56mEL/OP0ZJqHb 7+FkD91+MmWT67PIZhBTDRSEFtllgIn5IWeOs= Received: by 10.224.59.70 with SMTP id k6mr5643478qah.237.1273068611639; Wed, 05 May 2010 07:10:11 -0700 (PDT) Received: from [10.9.244.35] (72-254-85-150.client.stsn.net [72.254.85.150]) by mx.google.com with ESMTPS id 21sm4493549qyk.5.2010.05.05.07.10.10 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 05 May 2010 07:10:10 -0700 (PDT) Sender: Grant Ingersoll Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1078) Subject: Re: Relevancy Practices From: Grant Ingersoll In-Reply-To: Date: Wed, 5 May 2010 07:10:08 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <5CD2BE19-C279-41C9-A353-F25C5F5138AE@apache.org> References: To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.1078) Thanks, Peter. Can you share what kind of evaluations you did to determine that the end = user believed the results were equally relevant? How formal was that = process? -Grant On May 3, 2010, at 11:08 AM, Peter Keegan wrote: > We discovered very soon after going to production that Lucene's scores = were > often 'too precise'. For example, a page of 25 results may have = several > different score values, and all within 15% of each other, but to the = end > user all 25 results were equally relevant. Thus we wanted the = secondary sort > field to determine the order, instead. This required writing a custom = score > comparator to 'round' the scores. The same thing occurred for distance > sorting. We also limit the effect of term frequency to help prevent > spamming. In comparison to Avi, we use 'AND' as the default operator = for > keyword queries and if no docs are found, the query is automatically = retried > with 'OR'. This improves precision a bit and only occurs if the user > provides no operators. >=20 > Lucene's Explanation class has been invaluable in helping me to = explain a > particular sort order in many, many situations. > Most of our relevance tuning has occurred after deployment to = production. >=20 > Peter >=20 > On Thu, Apr 29, 2010 at 10:14 AM, Grant Ingersoll = wrote: >=20 >> I'm putting on a talk at Lucene Eurocon ( >> http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical >> Relevance" and I'm curious as to what people put in practice for = testing and >> improving relevance. I have my own inclinations, but I don't want to = muddy >> the water just yet. So, if you have a few moments, I'd love to hear >> responses to the following questions. >>=20 >> What worked? >> What didn't work? >> What didn't you understand about it? >> What tools did you use? >> What tools did you wish you had either for debugging relevance or = "fixing" >> it? >> How much time did you spend on it? >> How did you avoid over/under tuning? >> What stage of development/testing/production did you decide to do = relevance >> tuning? Was that timing planned or not? >>=20 >>=20 >> Thanks, >> Grant >>=20 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org