commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <...@spaceroots.org>
Subject [math] use case for NaNStrategy.FIXED?
Date Mon, 23 Jun 2014 18:38:13 GMT
Hi all,

While looking further in Percentile class for MATH-1120, I have found
another problem in the current implementation. NaNStrategy.FIXED should
leave the NaNs in place, but at the end of the KthSelector.select
method, a call to Arrays.sort moves the NaNs to the end of the small
sub-array. What is really implemented here is therefore closer to
NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
test cases because they use very short arrays, and we directly switch to
this part of the select method.

When I implemented the method, I explicitly avoided calling Arrays.sort
because it is a general purpose method that is know to be efficient only
for arrays of at least medium size. In most cases, when the array is
small one falls back to a non-recursive method like a very simple
insertion sort, which is faster for smaller arrays. In the select
operation, we know we have small sub-arrays at the call point. Going
back to the former insertionSort would recover the good behavior for
small arrays, but would in fact not be sufficient to really implement a
NaNStrategy.FIXED. I guess it would be simple to make it behave like
NaNStrategy.MAXIMAL but I did not try yet.

My point here is rather: can we really and should we really implement
NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
store the original position of the NaNs. It is quite cumbersome.

I wonder what is the use case for NaNStrategy.FIXED at all.

Going further and looking at the use cases for other NaNStrategy values,
the NaNs are replaced by +/- infinity before sorting, which is OK for
ranking as we only rely on the indices, but we use the values themselves
in Percentile. So sometimes, we end up with computing interpolations
like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
x[k+1] has been changed to +infinity, we get +infinity, instead of the
NaN we should have retrieved without replacement. So here again, I'm not
sure we are doing the right thing.

What do you think?

best regards,
Luc

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message