Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 71198 invoked from network); 2 Nov 2007 17:26:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Nov 2007 17:26:51 -0000 Received: (qmail 80120 invoked by uid 500); 2 Nov 2007 17:26:12 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 80086 invoked by uid 500); 2 Nov 2007 17:26:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 80075 invoked by uid 99); 2 Nov 2007 17:26:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Nov 2007 10:26:12 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.128.187 as permitted sender) Received: from [209.85.128.187] (HELO fk-out-0910.google.com) (209.85.128.187) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Nov 2007 17:26:14 +0000 Received: by fk-out-0910.google.com with SMTP id z23so842799fkz for ; Fri, 02 Nov 2007 10:25:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=PCtqzb+6GjT8V9iP9QqYrjJgPFTn4O20je5y/QJ59zA=; b=uMOSH9b6xf1+tPPWWyA+fuyW7yNy6+R+mHDx7smQR7dF15Cm44GRT72iWqRIU0b4/3SRnCXgTxqmGcrSAPbCGLJ25bHYN1YuyQaPuCrsuSnMXOj3UomxQjkqERmd1fZV260qSSIvSdza1OeHGuDJmLKLi5y7qmyI4G3BfLRLgCg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=Vv/DiG3h+WOymT1wfAjBmtDaQzPzjPXqDyhGmtivGfP/W2ko/YR+lcBq3S6MXhqO0Sl5orkW6w4Pd+1xoSdm8pMotmG6ueONC12Fp9PtVtS1rsNKEhoU1DN/y6rCxGB/+d5912IcnjB5RuUC0223V1SEZukSzVCkBqLa0drJ3Ck= Received: by 10.82.138.6 with SMTP id l6mr3986388bud.1194024351582; Fri, 02 Nov 2007 10:25:51 -0700 (PDT) Received: by 10.82.167.3 with HTTP; Fri, 2 Nov 2007 10:25:51 -0700 (PDT) Message-ID: <359a92830711021025g2f16a628p2255d9746b60f45c@mail.gmail.com> Date: Fri, 2 Nov 2007 13:25:51 -0400 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: RE : Re: problem undestanding the hits.score In-Reply-To: <318407.26712.qm@web25706.mail.ukl.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_14401_12188535.1194024351566" References: <995658.17899.qm@web25706.mail.ukl.yahoo.com> <318407.26712.qm@web25706.mail.ukl.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_14401_12188535.1194024351566 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline I strongly recommend against this. Simple word counts are a poor measure of relevance. Which is why Lucene doesn't score that way. Do you have an example showing why the default scoring is inadequate or is this just an assumption? It would be helpful if you gave us some idea of what you're trying to accomplish. What is the use-case you're trying to solve? That would generate more helpful responses I think.... Best Erick On 11/2/07, Jamal H Tandina wrote: > > <<<< > > If you want to give priority to documents that are larger, like z1, you > > should change the DefaultSimilarity (at index time), more exactly the > method: > > public float lengthNorm(String fieldName, int numTerms) { > return (float)(1.0 / Math.sqrt(numTerms)); > } > > to something like this > > public float lengthNorm(String fieldName, int numTerms) { > return (float)(Math.sqrt(numTerms)); > > > } > >>> > > I want to give priority to documents that have the word we are searching > more frequent ! > > Thank you > > > > Jamal H Tandina a =E9crit : Thank you for your reply > > How can i change the defaultSimilarity in the indexing and the searching, > do you have an example or an url how to set the Similarity ? > > http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org= /apache/lucene/search/Similarity.html > > Thanks again > > Ion Badita a =E9crit : Try too look at Similarity, there you will find > thinks about the > scoring. Your query is more "similar" with the shorter document. > If you have 2 documents with a field body; first with words "red flower" > and the second with just one word "flower", and search for the word > "flower", the second document will score high because is very similar > with the query. > > If you want to give priority to documents that are larger, like z1, you > should change the DefaultSimilarity (at index time), more exactly the > method: > > public float lengthNorm(String fieldName, int numTerms) { > return (float)(1.0 / Math.sqrt(numTerms)); > } > > to something like this > > public float lengthNorm(String fieldName, int numTerms) { > return (float)(Math.sqrt(numTerms)); > } > > > Reindex your documents with the Similarity modified and try to search > again. The IndexWriter has a method to set the similarity used for > indexing. > > > I hope this will help you... > > > Ion > > > > Jamal jamalator wrote: > > Hi > > > > I have indexed this html document > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3Dz1=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > zo zo zo zo zo zo zo zo zo zo zo zo > > > zo zo zo zo zo zo zo zo zo zo zo zo > > > zo zo zo zo zo zo zo zo zo zo zo zo > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3Dz2=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > zo zo zo zo zo zo zo zo zo zo zo zo > > > zo zo zo zo zo zo zo zo zo zo zo zo > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3Dz3=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > zo zo zo zo zo zo zo zo zo zo zo zo > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > with this code > > > > Field contentK1 =3D new Field("htmlcontent",httpd.getContentKeywords()= , > Field.Store.NO,Field.Index.TOKENIZED ); > > contentK1.setBoost(1/10f); //10% > > doc.add(contentK1); > > > > and when a search "zo" with luke i have (whitespaceanalyser): > > > > (score , id ) > > (0,0957,z2 ) > > (0,0947,z3 ) > > (0,0938,z1) > > > > NORMALY the resut expected have to be z1 z2 z3 > > > > Some One have an idea ?? > > > > Thank you all > > > > > > > > --------------------------------- > > Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers > Yahoo! Mail > > > > > > --------------------------------- > > Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo= ! > Mail > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > --------------------------------- > Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! > Mail > > > --------------------------------- > Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! > Mail ------=_Part_14401_12188535.1194024351566--