Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 77186 invoked from network); 4 May 2006 13:15:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 4 May 2006 13:15:32 -0000 Received: (qmail 11537 invoked by uid 500); 4 May 2006 13:14:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 11468 invoked by uid 500); 4 May 2006 13:14:53 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 11171 invoked by uid 99); 4 May 2006 13:14:52 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 May 2006 06:14:52 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [216.90.243.74] (HELO deloitte.ca) (216.90.243.74) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 May 2006 06:14:51 -0700 Received: from ([10.60.109.175]) by camail1.deloitte.ca with ESMTP id KBBDB27.39450750; Thu, 04 May 2006 09:14:06 -0400 Received: from canat0411.ca.deloitte.com ([10.55.44.111]) by canat0475.ca.deloitte.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 4 May 2006 09:10:54 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Subject: Newbie questions re: scoring Date: Thu, 4 May 2006 09:10:54 -0400 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Newbie questions re: scoring Thread-Index: AcZvfCyVjy0BlyHhT7mG1/aivbiUKQ== From: "Lee, Andrew J \(CA - Toronto\)" To: X-OriginalArrivalTime: 04 May 2006 13:10:54.0413 (UTC) FILETIME=[2CC147D0:01C66F7C] Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C66F7C.2C9EAF55" X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------_=_NextPart_001_01C66F7C.2C9EAF55 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi,=0D=0A=0D=0AI am new to Lucene and this mailing list, so my apologies if= these=0D=0Aquestions have already been answered=2E=0D=0A=0D=0A1) I create= an index with one document with a searchable field of "All=0D=0Adogs are b= rown=2E" If I search on that index with a query of "All dogs=0D=0Aare brow= n=2E" I do not get a hit with score 1=2E0, but something low like=0D=0A0=2E= 38=2E I tried looking at the scoring algorithm and can't make heads or=0D= =0Atails of it=2E Can anybody explain it to me in simple terms?=0D=0A=0D= =0A2) I have an index of documents, then run a search against it=2E I run= =0D=0Athrough the list of hits, building a Vector of documents whose score = is=0D=0Aabove a certain threshold=2E If I run the program with a threshold= of=0D=0Asay, 0=2E15, I'll get a Vector of documents with scores >=3D 0=2E1= 5 (as=0D=0Aexpected)=2E If I set the threshold higher (0=2E30, for example= ) and rerun=0D=0Athe program, I see some of the same documents that I thoug= ht would have=0D=0Abeen trimmed off with the higher threshold=2E With a th= reshold of 0=2E15=0D=0Athey would score 0=2E17, and with a threshold of 0= =2E30 they are scoring=0D=0Asomething like 0=2E33=2E Can anybody explain t= his? My trimming is coming=0D=0Apost-index-searching, so this is pretty co= nfusing=2E=0D=0A=0D=0AThanks in advance for any help=2E=0D=0A=0D=0AAndrew L= ee=0D=0A=0D=0A=0D=0A=0D=0A-----------------------------------------=0D=0A**= *****************************************************************=0D=0A****= ***************=0D=0AConfidentiality Warning: This message and any attachme= nts are=0D=0Aintended only for the use of the intended recipient(s), are=0D= =0Aconfidential, and may be privileged=2E If you are not the intended=0D=0A= recipient, you are hereby notified that any review, retransmission,=0D=0Aco= nversion to hard copy, copying, circulation or other use of this=0D=0Amessa= ge and any attachments is strictly prohibited=2E If you are not=0D=0Athe in= tended recipient, please notify the sender immediately by=0D=0Areturn e-mai= l, and delete this message and any attachments from=0D=0Ayour system=2E Tha= nk you=2E=0D=0A=0D=0AInformation confidentielle: Le pr=E9sent message, ains= i que tout=0D=0Afichier qui y est joint, est envoy=E9 =E0 l'intention exclu= sive de=0D=0Ason ou de ses destinataires; il est de nature confidentielle e= t=0D=0Apeut constituer une information privil=E9gi=E9e=2E Nous avertissons= =0D=0Atoute personne autre que le destinataire pr=E9vu que tout examen,=0D= =0Ar=E9acheminement, impression, copie, distribution ou autre=0D=0Autilisat= ion de ce message et de tout fichier qui y est joint est=0D=0Astrictement i= nterdit=2E Si vous n'=EAtes pas le destinataire pr=E9vu,=0D=0Aveuillez en a= viser imm=E9diatement l'exp=E9diteur par retour de=0D=0Acourriel et supprim= er ce message et tout document joint de votre=0D=0Asyst=E8me=2E Merci=2E=0D= =0A*******************************************************************=0D= =0A*******************=0D=0A ------_=_NextPart_001_01C66F7C.2C9EAF55--