Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 7278 invoked from network); 2 Nov 2007 09:11:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Nov 2007 09:11:18 -0000 Received: (qmail 41713 invoked by uid 500); 2 Nov 2007 09:11:00 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 41679 invoked by uid 500); 2 Nov 2007 09:11:00 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 41668 invoked by uid 99); 2 Nov 2007 09:11:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Nov 2007 02:11:00 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [213.157.179.162] (HELO mail.mcr.ro) (213.157.179.162) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Nov 2007 09:11:04 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.mcr.ro (Postfix) with ESMTP id 67BE62AC0A9 for ; Fri, 2 Nov 2007 11:24:08 +0200 (EET) Received: from mail.mcr.ro ([127.0.0.1]) by localhost (2beer [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 04458-07 for ; Fri, 2 Nov 2007 11:24:07 +0200 (EET) Received: from [192.168.0.32] (unknown [192.168.0.32]) by mail.mcr.ro (Postfix) with ESMTP id 939F12AC0A1 for ; Fri, 2 Nov 2007 11:24:07 +0200 (EET) Message-ID: <472AE991.9020603@mcr.ro> Date: Fri, 02 Nov 2007 11:10:41 +0200 From: Ion Badita User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: RE : Re: problem undestanding the hits.score References: <995658.17899.qm@web25706.mail.ukl.yahoo.com> In-Reply-To: <995658.17899.qm@web25706.mail.ukl.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mcr.ro X-Virus-Checked: Checked by ClamAV on apache.org For your specific problem you need to change the DefaultSimilarity only at index time, because the lengthNorm is written to the index when is created. So... first you'll need to extend the DefaultSimilarity and override the lengthNorm() method with the one suggested in the previous replay; then set your (changed) similarity to the IndexWriter like this: IndexWriter indexWriter = new IndexWriter(....); indexWriter.setSimilarity(new YourSimilarity()); Add your documents... and search. Ion Jamal H Tandina wrote: > Thank you for your reply > > How can i change the defaultSimilarity in the indexing and the searching, do you have an example or an url how to set the Similarity ? > http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html > > Thanks again > > Ion Badita a �crit : Try too look at Similarity, there you will find thinks about the > scoring. Your query is more "similar" with the shorter document. > If you have 2 documents with a field body; first with words "red flower" > and the second with just one word "flower", and search for the word > "flower", the second document will score high because is very similar > with the query. > > If you want to give priority to documents that are larger, like z1, you > should change the DefaultSimilarity (at index time), more exactly the > method: > > public float lengthNorm(String fieldName, int numTerms) { > return (float)(1.0 / Math.sqrt(numTerms)); > } > > to something like this > > public float lengthNorm(String fieldName, int numTerms) { > return (float)(Math.sqrt(numTerms)); > } > > > Reindex your documents with the Similarity modified and try to search > again. The IndexWriter has a method to set the similarity used for indexing. > > > I hope this will help you... > > > Ion > > > > Jamal jamalator wrote: > >> Hi >> >> I have indexed this html document >> =============z1======================== >> >> >> zo zo zo zo zo zo zo zo zo zo zo zo >> > > >> zo zo zo zo zo zo zo zo zo zo zo zo >> > > >> zo zo zo zo zo zo zo zo zo zo zo zo >> >> >> =============z2========================= >> >> >> zo zo zo zo zo zo zo zo zo zo zo zo >> > > >> zo zo zo zo zo zo zo zo zo zo zo zo >> > > >> >> >> =============z3========================== >> >> >> zo zo zo zo zo zo zo zo zo zo zo zo >> > > >> >> >> ========================================= >> with this code >> >> Field contentK1 = new Field("htmlcontent",httpd.getContentKeywords(),Field.Store.NO,Field.Index.TOKENIZED ); >> contentK1.setBoost(1/10f); //10% >> doc.add(contentK1); >> >> and when a search "zo" with luke i have (whitespaceanalyser): >> >> (score , id ) >> (0,0957,z2 ) >> (0,0947,z3 ) >> (0,0938,z1) >> >> NORMALY the resut expected have to be z1 z2 z3 >> >> Some One have an idea ?? >> >> Thank you all >> >> >> >> --------------------------------- >> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail >> >> >> --------------------------------- >> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > --------------------------------- > Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org