Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 33500 invoked from network); 21 Sep 2009 22:18:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Sep 2009 22:18:25 -0000 Received: (qmail 79715 invoked by uid 500); 21 Sep 2009 22:18:23 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 79638 invoked by uid 500); 21 Sep 2009 22:18:23 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 79628 invoked by uid 99); 21 Sep 2009 22:18:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Sep 2009 22:18:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of azlist1@gmail.com designates 72.14.220.156 as permitted sender) Received: from [72.14.220.156] (HELO fg-out-1718.google.com) (72.14.220.156) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Sep 2009 22:18:13 +0000 Received: by fg-out-1718.google.com with SMTP id 16so1206453fgg.5 for ; Mon, 21 Sep 2009 15:17:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=F2al+9oxIZNM1sOlKum78P4DDEvrPjAYPKvnKjbsjp4=; b=xQpe/Z7veUua2tHopGm4dH16r4QJqg284/f1ilNpXFdM+zDJSmY+hwXChNDMghsHnA 95Kypybvt3iattNy0TwB/Th3EljFRAm89Jt0E19gy5m3RBUZKxWN85rcPxHo3smQyRD8 5A5Niqs1o4wBWWfmPE1gstqCTWVjbSoycSoT4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=Gor8U1KC2OEn3Cockp9zdIxRp/NZA08tyF/Hy3NqxONSZ6vN0KSBYohUoM8dKxOJ4u PPJxo5jNLs8T/OjX22Tr+JcBMRfeW8Z/7X8JyNSu4RgHOArpz7YAELIab9+N2gaJKAq/ I6ApvJJVVJu5YK8ysQwpi4hQMdcC+vr/V0q+k= MIME-Version: 1.0 Received: by 10.86.17.29 with SMTP id 29mr199130fgq.38.1253571473574; Mon, 21 Sep 2009 15:17:53 -0700 (PDT) Date: Tue, 22 Sep 2009 00:17:53 +0200 Message-ID: <76c1202b0909211517g77b3e53ck6ffcaa6a21018d2@mail.gmail.com> Subject: Filtering query results based on relevance/acuracy From: Alex To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=00c09ffb4c71620ba804741ddaf1 X-Virus-Checked: Checked by ClamAV on apache.org --00c09ffb4c71620ba804741ddaf1 Content-Type: text/plain; charset=ISO-8859-1 Hi, I'm, a total newbie with lucene and trying to understand how to achieve my (complicated) goals. So what I'm doing is yet totally experimental for me but is probably extremely trivial for the experts in this list :) I use lucene and Hibernate Search to index locations by their name, type, etc ... The LocationType is an Object that has it's "name" field indexed both tokenized and untokenized. The following LocationType names are indexed "Restaurant" "Mexican Restaurant" "Chinese Restaurant" "Greek Restaurant" etc... Considering the following query : "Mexican Restaurant" I systematically get all the entries as a result, most certainly because the "Restaurant" keyword is present in all of them. I'm trying to have a finer grained result set. Obviously for "Mexican Restaurant" I want the "Mexican Restaurant" entry as a result but NOT "Chinese Restaurant" nor "Greek Restaurant" as they are irrelevant. But maybe "Restaurant" itself should be returned with a lower wight/score or maybe it shouldn't ... im not sure about this one. 1) How can I do that ? Here is the code I use for querying : String[] typeFields = {"name", "tokenized_name"}; Map boostPerField = new HashMap(2); boostPerField.put( "name", (float) 4); boostPerField.put( "tokenized_name", (float) 2); QueryParser parser = new MultiFieldQueryParser( typeFields , new StandardAnalyzer(), boostPerField ); org.apache.lucene.search.Query luceneQuery; try { luceneQuery = parser.parse(queryString); } catch (ParseException e) { throw new RuntimeException("Unable to parse query: " + queryString, e); } I guess that there is a way to filter out results that have a score below a given threshold or a way to filter out results based on score gap or anything similar. But I have no idea on how to do this... What is the best way to achieve what I want? Thank you for your help ! Cheers, Alex --00c09ffb4c71620ba804741ddaf1--