Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 4508 invoked from network); 24 Aug 2006 23:04:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 24 Aug 2006 23:04:59 -0000 Received: (qmail 67805 invoked by uid 500); 24 Aug 2006 23:04:51 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 67710 invoked by uid 500); 24 Aug 2006 23:04:50 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 67698 invoked by uid 99); 24 Aug 2006 23:04:50 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Aug 2006 16:04:50 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [169.229.70.167] (HELO rescomp.berkeley.edu) (169.229.70.167) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Aug 2006 16:04:49 -0700 Received: by rescomp.berkeley.edu (Postfix, from userid 1007) id 7E1065B78C; Thu, 24 Aug 2006 16:04:26 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by rescomp.berkeley.edu (Postfix) with ESMTP id 785A27F403 for ; Thu, 24 Aug 2006 16:04:26 -0700 (PDT) Date: Thu, 24 Aug 2006 16:04:26 -0700 (PDT) From: Chris Hostetter To: java-user@lucene.apache.org Subject: Re: Boosting Documents and score calculation In-Reply-To: <5968287.post@talk.nabble.com> Message-ID: References: <5968287.post@talk.nabble.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N First off, when trying to make sense of socres you should allways use either HitCollector or one of the TopDocs methods of the Searcher interface -- otherwise the "normalize if greater then 1" logic of the Hits class might confuse you. Second: Searcher.explain(Query,int) is your friend ... it will help you understand exactly where your scores are coming from Third: index time document boosts are folded into the "norm" value for that field (along with any index time field boosts and the length norm) ... these norms are "encoded" as a single byte, which can result in a loss of precision, so it wouldn't be too suprising if boosts of 1.0, 1.1, and 1.2 all encoded as the same value. (you can use Similarity.decodeNorm(Similarity.encodeNorm(some_float)) to see exactly how much precision is lost for any given float value. : Date: Thu, 24 Aug 2006 10:06:35 -0700 (PDT) : From: AlexeyG : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Boosting Documents and score calculation : : : Hello, : : I ran into some very strange behavior by Lucene 1.9. Boost factor under 1.3 : does not effect the result score! I wrote a simple test to isolate the : issue: : : Writing test index : Creating 3 documents with same KEY and boosts of default, 1.1, 1.2, and 1.3 : : public static void writeTestIndex() throws IOException { : : // opening index writer : IndexWriter writer = null; : writer = new IndexWriter("C:\\a_temp", new StandardAnalyzer(), true); : : Document currentDocument = null; : : // creating and adding document with DEFAULT boost : currentDocument = new Document(); : currentDocument.add(new Field("KEY", "AA", Field.Store.YES, : Field.Index.UN_TOKENIZED)); : currentDocument.add(new Field("BOOST_FACTOR", "1", Field.Store.YES, : Field.Index.UN_TOKENIZED)); : writer.addDocument(currentDocument); : : // creating and adding document with 1.1 boost : currentDocument = new Document(); : currentDocument.add(new Field("KEY", "AA", Field.Store.YES, : Field.Index.UN_TOKENIZED)); : currentDocument.add( new Field("BOOST_FACTOR", "1.1", Field.Store.YES, : Field.Index.UN_TOKENIZED)); : currentDocument.setBoost((float)1.1); : writer.addDocument(currentDocument); : : // creating and adding document with 1.2 boost : currentDocument = new Document(); : currentDocument.add(new Field("KEY", "AA", Field.Store.YES, : Field.Index.UN_TOKENIZED)); : currentDocument.add( new Field("BOOST_FACTOR", "1.2", Field.Store.YES, : Field.Index.UN_TOKENIZED)); : currentDocument.setBoost((float)1.2); : writer.addDocument(currentDocument); : : // creating and adding document with 1.3 boost : currentDocument = new Document(); : currentDocument.add(new Field("KEY", "AA", Field.Store.YES, : Field.Index.UN_TOKENIZED)); : currentDocument.add(new Field("BOOST_FACTOR", "1.3", Field.Store.YES, : Field.Index.UN_TOKENIZED)); : currentDocument.setBoost((float)1.3); : writer.addDocument(currentDocument); : : // optimizing and closing IndexWriter : writer.optimize(); : writer.close(); : } : : : Test Search : Searching for the KEY value, which is the same in all 4 documents : : public static void testIndex() throws IOException { : : // opening IndexSearcher : IndexSearcher searcher = null; : searcher = new IndexSearcher("C:\\a_temp"); : : // searching for KEY : Hits hits = searcher.search(new TermQuery(new Term("KEY", "AA"))); : : // listing documents and their BOOST_FACTOR field : Document doc = null; : if (null != hits) { : logger.debug("Listing results: "); : for (int i = 0; i < hits.length(); i++) { : doc = hits.doc(i); : logger.debug("BOOST_FACTOR field: " + doc.get("BOOST_FACTOR") + " Score: : " + hits.score(i)); : } : } : : // closing IndexSearcher : searcher.close(); : } : : Output : : BOOST_FACTOR field: 1.3 Score: 0.9710705 : BOOST_FACTOR field: 1 Score: 0.7768564 : BOOST_FACTOR field: 1.1 Score: 0.7768564 : BOOST_FACTOR field: 1.2 Score: 0.7768564 : : Boost of 1.1 and 1.2 did not effect score for the last 2 documents! : Document with boost of 1.3 jumped to the top, but the rest were returned in : the order they were added to the index. : : What am I missing here? I thought document score would reflect all levels : of boost, not just 1.3 and above? Please help. : -- : View this message in context: http://www.nabble.com/Boosting-Documents-and-score-calculation-tf2159899.html#a5968287 : Sent from the Lucene - Java Users forum at Nabble.com. : : : --------------------------------------------------------------------- : To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org : For additional commands, e-mail: java-user-help@lucene.apache.org : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org