Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 63909 invoked from network); 26 Apr 2008 15:31:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Apr 2008 15:31:27 -0000 Received: (qmail 41460 invoked by uid 500); 26 Apr 2008 15:31:21 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 41427 invoked by uid 500); 26 Apr 2008 15:31:21 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 41416 invoked by uid 99); 26 Apr 2008 15:31:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Apr 2008 08:31:21 -0700 X-ASF-Spam-Status: No, hits=2.4 required=10.0 tests=MSGID_MULTIPLE_AT,SPF_HELO_PASS,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.227.126.188] (HELO moutng.kundenserver.de) (212.227.126.188) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Apr 2008 15:30:36 +0000 Received: from DanielPC (p4FC760CD.dip.t-dialin.net [79.199.96.205]) by mrelayeu.kundenserver.de (node=mrelayeu0) with ESMTP (Nemesis) id 0MKwh2-1JpmMR2aQa-0007OC; Sat, 26 Apr 2008 17:30:47 +0200 From: "Daniel Freudenberger" To: References: <-1858080938835305288@unknownmsgid> <6dd2bfeb0804250959k51d0286cj9b954be42c06c7d2@mail.gmail.com> <3760940526082580826@unknownmsgid> <6dd2bfeb0804251110ybab00cakbd2661cd2362ea4c@mail.gmail.com> <000301c8a70d$a9717e60$fc547b20$@freudenberger@trade-a-game.de> In-Reply-To: Subject: RE: boosting relevance of certain documents Date: Sat, 26 Apr 2008 17:28:28 +0200 Message-ID: <000e01c8a7b2$2d3c1a30$87b44e90$@freudenberger@trade-a-game.de> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcinJahXt47nZvbDSjmtJ4hxf3iipQAi+nNQ Content-Language: de X-Provags-ID: V01U2FsdGVkX1/Tn56s+RDPDOGBcxJUDeao/Y3bF7iVjoNjGu2 5xUKEpjXUMjT8G6M2vrB1L73D/XG3O6zwoy2Izq8wSLRDWcNcw KsFNc8uHB5c7iBh5wgxg2lUKdIoMIwmCmjfl1yT13ja2fbf3wX FaQ== X-Virus-Checked: Checked by ClamAV on apache.org Hello, thanks for your detailed response. I didn't know there was a method called setBoost for adjusting the relevance of a certain document. Now I simply calculate the boosting factor for the document, based on its newness, the sales rank and some other values. Thank you very much. Best regards, Daniel -----Original Message----- From: Grant Ingersoll [mailto:gsingers@apache.org] Sent: Saturday, April 26, 2008 12:42 AM To: java-user@lucene.apache.org Subject: Re: boosting relevance of certain documents It really depends. Hand tuning scoring algs for a specific query is very prone to local maxima problems. In other words, you fix one query and break 50 others. Sometimes, a good old "configurable" hard code is the way to go. If you want a specific doc to be #1, make it number one. You will pull your hair out otherwise. In Solr, this is handled via the Query Elevation Component, but isn't all that difficult to implement. Likewise, if you have a priori knowledge that a particular document is more important, then give it a relatively large boost during indexing, being aware that Lucene does not offer much granularity in terms of boosts. In other words, boost it something like 5 or 10 times, instead of 1.1 vs. 1.2. On the other hand, if you are truly seeing broad problems, then you need to build up a set of queries and judgments (ala TREC) or the contrib/benchmark Quality packages. You might also look at Lucene's Similarity class. Lucene's length normalization is often less than optimal for certain types of documents (see the IBM Haifa's assessment for the "Million Query" track of TREC on the Lucene Wiki). Cheers, Grant On Apr 25, 2008, at 3:50 PM, Daniel Freudenberger wrote: > Thanks for your response. I already knew that the relevance is based > on the > term frequency but in some cases it's just not what the user expects. > As I already mentioned, "fifa 2003 fifa 03" vs. "fifa 08" is such a > case - > searching for "fifa" would return the "fifa 2003 fifa 03" document > first but > the "fifa 08" document is more important (from the user's point of > view). > > Any suggestions? > > Best regards, > Daniel > -----Original Message----- > From: Jonathan Ariel [mailto:ionathan@gmail.com] > Sent: Friday, April 25, 2008 8:11 PM > To: java-user@lucene.apache.org > Subject: Re: boosting relevance of certain documents > > Ok. So I'm not an expert of the scoring algorithm, but based on > tf*idf you > can tell that the returned document is more relevant because it has > more > term frequency. > > Using the explain you can see the following: > > Doc 1 > 0.643841 = (MATCH) fieldWeight(searchable:fifa in 0), product of: > 1.0 = tf(termFreq(searchable:fifa)=1) > 1.287682 = idf(docFreq=2) > 0.5 = fieldNorm(field=searchable, doc=0) > > Doc2 > 0.68289655 = (MATCH) fieldWeight(searchable:fifa in 1), product of: > 1.4142135 = tf(termFreq(searchable:fifa)=2) > 1.287682 = idf(docFreq=2) > 0.375 = fieldNorm(field=searchable, doc=1) > > On Fri, Apr 25, 2008 at 2:30 PM, Daniel Freudenberger < > d.freudenberger@trade-a-game.de> wrote: > >> I'm using the StandardAnalyzer - hope this answers your question (I'm >> quite >> new to the lucene thing) >> >> -----Original Message----- >> From: Jonathan Ariel [mailto:ionathan@gmail.com] >> Sent: Friday, April 25, 2008 6:59 PM >> To: java-user@lucene.apache.org >> Subject: Re: boosting relevance of certain documents >> >> How are you analyzing the searchable field? >> >> On Fri, Apr 25, 2008 at 12:49 PM, Daniel Freudenberger < >> d.freudenberger@trade-a-game.de> wrote: >> >>> Hello, >>> >>> >>> >>> I'm using lucene within a new project and I'm not sure about how to >> solve >>> the following problem: My index consists of the two attributes >>> "id" and >>> "searchable". "id" is the id of a product and "searchable" is a >>> combination >>> of the product name and its category name. >>> >>> >>> >>> example: >>> >>> id searchable >>> >>> 1 fifa 08 - playstation 3 >>> >>> 2 fifa 2003 fifa 03 - playstation 3 >>> >>> 3 playstation 60gb hdd - playstation 3 >>> >>> 4 playstation i like you - playstation 3 >>> >>> >>> >>> When searching for "fifa", lucene returns the product with id 2 at >> first, >>> whereas id 1 ("fifa 08") would be the much more relevant result >>> (from >> the >>> user side of view). the same problem arises when searching for >>> "playstation" >>> - the customer expects products having "playstation" in their >>> names at >>> first, ideally the console itself. in reality however, he gets all >>> possible >>> products which are in the "playstation" category as well. >>> >>> >>> >>> my idea was to introduce another attribute relevance, which may >>> increase >>> the >>> relevance of an entry. the actual relevance shouldn't be suppressed >>> completely though, but should only be taken into account with >>> products >>> that >>> are similarly relevant for a specific search term. >>> >>> >>> >>> Does anybody have an idea on how to solve this problem? >>> >>> >>> >>> Thank you in advance, >>> >>> Daniel >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > -------------------------- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org