Yonik Seeley wrote:
> I'm not sure I understand why this is. epsilon is based on 1,
> (smallest number such that 1epsilon != 1, right?). What's special
> about 1?
1 is special for multiplication, but, you're right, not so special for
addition, the operation in question. The thing that makes addition
accurate is more mantissa bits. Epsilon is proportional to the number
of mantissa bits. So smaller epsilons will give us more accuracy, but,
you're right, a particular epsilon value won't guarantee us accuracy.
> I'm worried about the impact of things like this:
> smallfloat(10) + smallfloat(1) + smallfloat(1) + smallfloat(1) > 10
>
> And it makes things very order dependent:
> smallfloat(1) + smallfloat(1) + smallfloat(1) + smallfloat(10) > 12
10 and 12 are pretty close scores, so while this is clearly not a good
thing, relevant and irrelevant documents are hopefully separated by more
than this. In any case, it would be a whole lot more accurate than
ignoring tfs altogether. And we can do better in this particular case,
using 4 or 5 bit mantissas.
> Also, epsilon related to the mantissa, not the exponent?
> That would make it 1/8, not 1/32.
I'm not sure what you're saying. The current epsilon, with 3bit
mantissa, is 1/8, right? With a five bit mantissa it would go to 1/32, no?
> Also, if we don't need to represent very small numbers, we could lower
> the zero point of the exponent (currently it's 15 for the 5/3 split),
> right?
Right. Arguably we don't need numbers smaller than 1/100. A 4bit
mantissa with a zero exponent point of 5 gives a minimum value of .0005
and a max of 2M, plenty of range. A 5bit mantissa with zeroexponent
point of 2 gives us a minimum of .03 and a max of around 2k, nearly the
desired range, but with greater precision. In your case above, 10+1+1
would give 12, moreover 10+.5+.5 would give 11. I think this is
probably the best choice. What do you think?
Doug

To unsubscribe, email: javadevunsubscribe@lucene.apache.org
For additional commands, email: javadevhelp@lucene.apache.org
