lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lee, Andrew J \(CA - Toronto\)" <>
Subject Newbie questions re: scoring
Date Thu, 04 May 2006 13:10:54 GMT

I am new to Lucene and this mailing list, so my apologies if these
questions have already been answered.

1)  I create an index with one document with a searchable field of "All
dogs are brown."  If I search on that index with a query of "All dogs
are brown." I do not get a hit with score 1.0, but something low like
0.38.  I tried looking at the scoring algorithm and can't make heads or
tails of it.  Can anybody explain it to me in simple terms?

2)  I have an index of documents, then run a search against it.  I run
through the list of hits, building a Vector of documents whose score is
above a certain threshold.  If I run the program with a threshold of
say, 0.15, I'll get a Vector of documents with scores >= 0.15 (as
expected).  If I set the threshold higher (0.30, for example) and rerun
the program, I see some of the same documents that I thought would have
been trimmed off with the higher threshold.  With a threshold of 0.15
they would score 0.17, and with a threshold of 0.30 they are scoring
something like 0.33.  Can anybody explain this?  My trimming is coming
post-index-searching, so this is pretty confusing.

Thanks in advance for any help.

Andrew Lee

Confidentiality Warning: This message and any attachments are
intended only for the use of the intended recipient(s), are
confidential, and may be privileged. If you are not the intended
recipient, you are hereby notified that any review, retransmission,
conversion to hard copy, copying, circulation or other use of this
message and any attachments is strictly prohibited. If you are not
the intended recipient, please notify the sender immediately by
return e-mail, and delete this message and any attachments from
your system. Thank you.

Information confidentielle: Le présent message, ainsi que tout
fichier qui y est joint, est envoyé à l'intention exclusive de
son ou de ses destinataires; il est de nature confidentielle et
peut constituer une information privilégiée. Nous avertissons
toute personne autre que le destinataire prévu que tout examen,
réacheminement, impression, copie, distribution ou autre
utilisation de ce message et de tout fichier qui y est joint est
strictement interdit. Si vous n'êtes pas le destinataire prévu,
veuillez en aviser immédiatement l'expéditeur par retour de
courriel et supprimer ce message et tout document joint de votre
système. Merci.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message