lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From manjula wijewickrema <manjul...@gmail.com>
Subject scoring and index size
Date Fri, 09 Jul 2010 07:20:50 GMT
Hi,

I run a single programme to see the way of scoring by Lucene for single
indexed document. The explain() method gave me the following results.
*******************

Searching for 'metaphysics'

Number of hits: 1

0.030706111

0.030706111 = (MATCH) fieldWeight(contents:metaphys in 0), product of:

10.246951 = tf(termFreq(contents:metaphys)=105)

0.30685282 = idf(docFreq=1, maxDocs=1)

0.009765625 = fieldNorm(field=contents, doc=0)

*****************

But I encountered the following problems;

1) In this case, I did not change or done anything to Boost values. So that
should fieldNorm = 1/sqrt(terms in field)? (because I noticed that in Lucene
email archive,  default boost values=1)

2) But, even if I manually calculate the value for fieldNorm (as
=1/sqrt(terms in field)), it doesn't match (approximately it matches) with
the value with given by the system for fieldNorm. Can this be due to
encode/decode precision loss of norm?

3) In my indexed document, my indexed document was consisted with total
number of 19078 words including 125 times of word 'metaphysics' (i.e my
query. I input single term query) . But as you can see in the above output,
system gives only 105 counts for word 'metaphysics'. But once I reduce some
part of my index document and count the number of 'metaphysics' words and
checked with the system results. I noticed that with reduction of text from
index document, system counts it correctly. Why this kind of behaviour? Is
there any limitation for the indexed documents?

If somebody can pls. help me to solve these problems.

Thanks!

Manjula.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message