commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [Math] MATH-1129
Date Wed, 18 Jun 2014 13:40:56 GMT
On Wed, 18 Jun 2014 15:02:41 +0200, Luc Maisonobe wrote:
> Le 18/06/2014 14:32, Gilles a écrit :
>> Hello Luc.
>>
>>>>
>>>> See
>>>>   https://issues.apache.org/jira/browse/MATH-1129
>>>>
>>>> The problem reported was due to the sorting procedure not behaving
>>>> correctly in the presence of NaN.
>>>> This procedure could be replaced by an equivalent method from the 
>>>> JDK:
>>>>   java.util.Arrays.sort(double[],int,int)
>>>>
>>>> Any objection?
>>>
>>> If you imply removing the select, medianOf3 and partition methods, 
>>> yes I
>>> would object. If you imply replacing only the insertionSort method 
>>> used
>>> for small sub-arrays, then I almost agree.
>>
>> Issue 1129 concerns the latter: See the comment in "insertionSort" 
>> in the
>> current code.
>>
>> However, for the former, you should really have a look at MATH-1120
>>   https://issues.apache.org/jira/browse/MATH-1120
>> The proposed patch indeed touches those methods.
>
> So I think it is worth fixing MATH-1120 first, with the NaNStrategy 
> and
> go back to MATH-1129 afterwards, to see if it still applies of if
> MATH-1120 also fixed it.

As per my last comment on MATH-1120, the "NaNStrategy" part of the 
patch
is an extension of the initial feature request.

As per the Javadoc, the CM code was originally meant to handle (in some
way), the presence of NaN values within the data. So I thought that 
this
should be fixed before extending the API (which entailed additional
questions as the proposed patch is fairly "massive").

Especially, the new code is clearly untested for performance, and this
seems to be an important argument by your previous post.

So, it is possible to add "isNaN" conditional, as I did in 
"insertionSort"
and fix the current code without additional impact.

Or, we redesign (as per MATH-1120) which implies forgetting about
performance for now. The patch removes "insertionSort" altogether!
IMHO, it is too many changes at once...

What is the meaning of percentile when NaNs are kept in the data
(strategy "FIXED")?
For all the other strategies, the NaNs will not cause a problem
within the algorithms being discussed here (being replaced by +inf
or -inf, or removed, or raising an exception).


>
> I will have a look at the proposed patch, including your comments.

Thanks.

Gilles

>> [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message