Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 49895 invoked from network); 30 Apr 2004 22:16:09 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 30 Apr 2004 22:16:09 -0000 Received: (qmail 35617 invoked by uid 500); 30 Apr 2004 22:15:53 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 35591 invoked by uid 500); 30 Apr 2004 22:15:53 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 35568 invoked from network); 30 Apr 2004 22:15:53 -0000 Received: from unknown (HELO rozz.csail.mit.edu) (128.30.2.16) by daedalus.apache.org with SMTP; 30 Apr 2004 22:15:53 -0000 Received: from bahamut.csail.mit.edu ([128.30.44.27]) by rozz.csail.mit.edu with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.30) id 1BJgIZ-000604-UW for lucene-dev@jakarta.apache.org; Fri, 30 Apr 2004 18:15:59 -0400 Date: Fri, 30 Apr 2004 18:15:59 -0400 (EDT) From: "Matthew W. Bilotti" To: lucene-dev@jakarta.apache.org Subject: Help with scoring, coordination factor? Message-ID: Organization: Massachusetts Institute of Technology X-GPG-PUBLIC_KEY: http://web.mit.edu/mbilotti/www/mbilotti_public_key.asc X-GPG-FINGERPRING: C566 09E5 1594 BB63 2732 DBAA 3C93 F73F 7B7E 403D MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N > In my case it works perfectly. As we generate multilingual and semantic > expansions of the original words of a query, the coordination factor was > giving lower score to words with a lot of semantic or morphologic > variants. > For me, this has not worked. I have defined a WordQuery class and used it to define my disjunctions, but I am finding that the documents I am interested in are still suffering rank penalties. I wanted to try to understand how the scoring was working internally, so for each document in my Hits, I printed the score and an Explanation, when quering on the original forms of each word only (no WordQueries used). The first document returned had a score of 0.592 and an explanation of "0.0 = match required". Can anyone tell me what this means? The next 39 documents retrieved have the same explanation, and steadily decreasing scores, which makes sense. The 40th document retrieved, though, has a score of 1.0 and the explanation: 0.0 = fieldWeight(contents:invented in 0), product of: 0.0 = tf(termFreq(contents:invented)=0) 6.507968 = idf(docFreq=4189) 0.0390625 = fieldNorm(field=contents, doc=0) Can anyone help me understand why a document with score 1.0 is retrieved directly after a document with score 0.211? I don't understand the explanation. Why is the term frequency of "invented" 0? It should be 3; I checked the document. I tried to delve into the code to find out how to print all of the components of the score to the screen (especially coord, which I am interested in), but I couldn't figure out how to do it. Any help or hints you can give me would be truly appreciated. ~ Matthew -- matthew w. bilotti computer science and artificial intelligence laboratory massachusetts institute of technology --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org