lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: How to pull document scoring values
Date Wed, 29 Sep 2004 17:15:36 GMT
On Wednesday 29 September 2004 15:41, Zia Syed wrote:
> Hi Paul,
> Thanks for your detailed reply! It really helped alot.
> However, I am experiancing some conflicts.
>
> For one of the documents in result set, when i use
>
> IndexReader fir=FilterIndexReader.open("index");
> byte[] fNorm=fir.norm("Body");
> System.out.println("FNorm: "+ fNorm[306]);
> Document d=fir.document(306);
> Field f=d.getField("Body");
>
> System.out.println("Body: "+ f.stringValue());
>
> This gives me out fNorm 113, whereas total number of term (including
> stop-words) are 42 in this particular field of selected document. In the
> explanation , fieldNorm (field=Body, doc=306) is 0.1562, which is approx
> 41 term words for that field in that documents. So explanation values
> makes sense with real data, while including all stop words like to,it,
> the & etc.
>
> So, my question is,
>
> > Am i getting the norm values from right place?

Yes, but the stored norms are encoded/decoded:
byte Similarity.encodeNorm(float)
float Similarity.decodeNorm(byte)

> > Is there any way to find out number of indexed terms for each
>
> document?

By default, the stored norm is the inverse square root of 
the number of indexed terms of an indexed document field.
The encoding/decoding is somewhat rough, though.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message