Yonik Seeley wrote:
> Hmmm, very interesting idea.
> Less than one decimal digit of precision might be hard to swallow when
> you have to add scores together though:
>
> smallfloat(score1) + smallfloat(score2) + smallfloat(score3)
>
> Do you think that the 5/3 exponent/mantissa split is right for this,
> or would a 4/4 be better?
The float epsilon should ideally be greater than the minimum score
increment, and the float range should ideally be at least 100x greater
than the maximum score increment, to permit boosting, large queries, etc.
Given a 100M document collection, the maximum idf is log(100M) = ~18,
with a lengthnormalized tf of 1, for a max of 18. So the float range
should ideally be around 1800 or greater.
The minimum idf is 1, and the minimum normalized tf with 10k word
documents is 1/100. So the float epsilon should ideally be less than 1/100.
5 bits of mantissa and 3 bits of exponent is closest to this, but not
quite there, with an epsilon of 1/32 and a range of up to ~1000.
Did I get the math right?
Doug

To unsubscribe, email: javadevunsubscribe@lucene.apache.org
For additional commands, email: javadevhelp@lucene.apache.org
