lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: scoring and index size
Date Fri, 09 Jul 2010 07:36:52 GMT
Maybe you have MaxFieldLength.LIMITED instead of UNLIMITED? Then the number
of terms per document is limited.

The calculation precision is limited by the float norm encoding, but also if
your analyzer removed stop words, so the norm is not what you exspect?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: manjula wijewickrema [mailto:manjula53@gmail.com]
> Sent: Friday, July 09, 2010 9:21 AM
> To: java-user@lucene.apache.org
> Subject: scoring and index size
> 
> Hi,
> 
> I run a single programme to see the way of scoring by Lucene for single
> indexed document. The explain() method gave me the following results.
> *******************
> 
> Searching for 'metaphysics'
> 
> Number of hits: 1
> 
> 0.030706111
> 
> 0.030706111 = (MATCH) fieldWeight(contents:metaphys in 0), product of:
> 
> 10.246951 = tf(termFreq(contents:metaphys)=105)
> 
> 0.30685282 = idf(docFreq=1, maxDocs=1)
> 
> 0.009765625 = fieldNorm(field=contents, doc=0)
> 
> *****************
> 
> But I encountered the following problems;
> 
> 1) In this case, I did not change or done anything to Boost values. So
that
> should fieldNorm = 1/sqrt(terms in field)? (because I noticed that in
Lucene
> email archive,  default boost values=1)
> 
> 2) But, even if I manually calculate the value for fieldNorm (as
=1/sqrt(terms
> in field)), it doesn't match (approximately it matches) with the value
with
> given by the system for fieldNorm. Can this be due to encode/decode
> precision loss of norm?
> 
> 3) In my indexed document, my indexed document was consisted with total
> number of 19078 words including 125 times of word 'metaphysics' (i.e my
> query. I input single term query) . But as you can see in the above
output,
> system gives only 105 counts for word 'metaphysics'. But once I reduce
some
> part of my index document and count the number of 'metaphysics' words
> and checked with the system results. I noticed that with reduction of text
> from index document, system counts it correctly. Why this kind of
> behaviour? Is there any limitation for the indexed documents?
> 
> If somebody can pls. help me to solve these problems.
> 
> Thanks!
> 
> Manjula.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message