Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 54552 invoked from network); 27 Oct 2010 14:35:44 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Oct 2010 14:35:44 -0000 Received: (qmail 95040 invoked by uid 500); 27 Oct 2010 14:35:44 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 94970 invoked by uid 500); 27 Oct 2010 14:35:41 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 94958 invoked by uid 99); 27 Oct 2010 14:35:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Oct 2010 14:35:40 +0000 X-ASF-Spam-Status: No, hits=4.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,SPF_HELO_PASS,SPF_NEUTRAL,T_TO_NO_BRKTS_FREEMAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 216.139.236.158 is neither permitted nor denied by domain of reachbalaji.a@gmail.com) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Oct 2010 14:35:35 +0000 Received: from ben.nabble.com ([192.168.236.152]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1PB75u-0000Bq-Mb for general@lucene.apache.org; Wed, 27 Oct 2010 07:35:14 -0700 Date: Wed, 27 Oct 2010 07:35:14 -0700 (PDT) From: "Balaji.A" To: general@lucene.apache.org Message-ID: <1288190114695-1780471.post@n3.nabble.com> Subject: Scoring Pattern for partial and exact match search results MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_30693_17186167.1288190114695" ------=_Part_30693_17186167.1288190114695 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, I have 6 fields in a document with respective data types given below. field name data type ------------------------ content text title text description text content_em text_ws title_em text_ws description text_ws My requirement is to prioritize search results based on exact and partial match conditions. Document that have exact match should have high score than documents with partial match. To achieve this problem I have added 3 fields (content_em,title_em,description_em) which contains the same content of content,title and description respectively. My dismax query is something similar to this mm=1&qf=content^100+description^200+title^300&pf=content_em^500000+description_em^700000+title_em^900000&fl=id&start=0&q=London&qt=dismax I have 2 problems with this approach: Problem 1: For instance if doc1 has London text appearing 1 time in description, content and title fields and doc2 has same text appearing 1 time only in description and content field, doc2 gives me high score than doc1. Can anyone explain why this happens? Since I give more boost to title field, I expect term matching that field should be given more score. Problem 2 Another scenario is with the search term "Ryder Cup". Doc 1 has text "Cup" appearing 20 or more times in content field Doc 2 has text "Ryder Cup" appearing 1 time in title field On search I expect Doc 2 to be on top since I want exact match documents to be prioritized over partial match documents. But unfortunatly Doc 1 comes on top with more scoring. Since I am new to Lucene, can anyone help me to solve these problem? Many Thanks, Balaji. -- View this message in context: http://lucene.472066.n3.nabble.com/Scoring-Pattern-for-partial-and-exact-match-search-results-tp1780471p1780471.html Sent from the Lucene - General mailing list archive at Nabble.com. ------=_Part_30693_17186167.1288190114695--