commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [math] use case for NaNStrategy.FIXED?
Date Mon, 30 Jun 2014 16:25:36 GMT
On Mon, 30 Jun 2014 18:48:02 +0530, venkatesha murthy wrote:
> On Mon, Jun 30, 2014 at 4:11 AM, Gilles 
> <gilles@harfang.homelinux.org>
> wrote:
>
>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>
>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>
>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>>>>
>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>>>>
>>>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>>>> <phil.steitz@gmail.com> wrote:
>>>>>>
>>>>>>  On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>>>>
>>>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>
>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>>>>> <luc@spaceroots.org>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Venkat,
>>>>>>>>>>
>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>>>>> <luc@spaceroots.org>
>>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> While looking further in Percentile class
for MATH-1120, I
>>>>>>>>>>>> have found
>>>>>>>>>>>> another problem in the current implementation.
>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>>>>
>>>>>>>>>>> should
>>>>>>>
>>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>>>>> KthSelector.select
>>>>>>>>>>>> method, a call to Arrays.sort moves the NaNs
to the end of
>>>>>>>>>>>> the small
>>>>>>>>>>>> sub-array. What is really implemented here
is therefore
>>>>>>>>>>>> closer to
>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED.
This always
>>>>>>>>>>>> occur in the
>>>>>>>>>>>> test cases because they use very short arrays,
and we
>>>>>>>>>>>> directly
>>>>>>>>>>>>
>>>>>>>>>>> switch to
>>>>>>>
>>>>>>>> this part of the select method.
>>>>>>>>>>>>
>>>>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>>>>> If MAXIMAL represent replacing for +inf ; you
need
>>>>>>>>>>> something to
>>>>>>>>>>> indicate beyond this for NaN.
>>>>>>>>>>>
>>>>>>>>>> Well, we can just keep the NaN themselves and handled
them
>>>>>>>>>> appropriately, hoping not to trash performances too
much.
>>>>>>>>>>
>>>>>>>>>>  Agreed.
>>>>>>>>>
>>>>>>>>>  What is the test input you see an issue and what is
the
>>>>>>>>>>> actual error
>>>>>>>>>>> you are seeing. Please share the test case.
>>>>>>>>>>>
>>>>>>>>>> Just look at PercentileTest.testReplaceNanInRange().
The
>>>>>>>>>> first check in
>>>>>>>>>> the test corresponds to a Percentile configuration
at 50%
>>>>>>>>>> percentile,
>>>>>>>>>> and NaNStrategy.FIXED. The array has an odd number
of
>>>>>>>>>> entries, so the
>>>>>>>>>> 50% percentile is exactly one of the entries: the
one at
>>>>>>>>>> index 5 in the
>>>>>>>>>> final array.
>>>>>>>>>>
>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN,
5, 7,
>>>>>>>>>> NaN, 8 }. So
>>>>>>>>>> for the NaNStrategy.FIXED setting, it should not
be modified
>>>>>>>>>> at all in
>>>>>>>>>> the selection algorithm and the result for 50% should
be the
>>>>>>>>>> first NaN
>>>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>>>>> we *do*
>>>>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8,
NaN, NaN,
>>>>>>>>>> NaN }, so
>>>>>>>>>> the result is 5.
>>>>>>>>>>
>>>>>>>>>> Agreed. just verified by putting back the earlier

>>>>>>>>>> insertionSort
>>>>>>>>>>
>>>>>>>>> function.
>>>>>>>
>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%,
we
>>>>>>>>>> get as a
>>>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>>>>
>>>>>>>>>> If we agree to leave FIXED as unchanged behaviour
with your
>>>>>>>>>>
>>>>>>>>> insertionSort
>>>>>>>
>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>>>>> transformation
>>>>>>>>>
>>>>>>>> of
>>>>>>>
>>>>>>>> NaN to +/-Inf
>>>>>>>>>
>>>>>>>> I'm fine with it, as long as we clearly state it in the
>>>>>>>> NaNStrategy
>>>>>>>> documentation, saying somethig like:
>>>>>>>>
>>>>>>>>  When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>>>>> *replaced* by
>>>>>>>>  +/- infinity, hence the results will be computed as if
>>>>>>>> infinities were
>>>>>>>>  there in the first place.
>>>>>>>>
>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for one 
>>>>>>> specific
>>>>>>> purpose - to turn extend the ordering on doubles a total 
>>>>>>> ordering.
>>>>>>> Strategies prescribe only how NaNs are to be treated in the
>>>>>>> ordering.  Side effects on the input array don't make sense in
>>>>>>> this
>>>>>>> context.  The use case for which this class was created was 
>>>>>>> rank
>>>>>>> transformations.  Returning infinities in rank transformations
>>>>>>> would
>>>>>>> blow up computations in these classes.  If a floating point
>>>>>>> computations need to transform NaNs, the specific stats / other
>>>>>>> classes that do this transformation should perform and document
>>>>>>> the
>>>>>>> behavior.
>>>>>>>
>>>>>>> Phil
>>>>>>>
>>>>>>>  OK. Agreed realized it later.  Hence i have not touched
>>>>>> NaNStrategy in my
>>>>>> patch(MATH-1132) . Now i have added a separate transformer to
>>>>>> transform NaNs
>>>>>> but it uses NanStrategy.  Please let me know on this as this
>>>>>> trasnformation
>>>>>> itself
>>>>>> can be used in different places.
>>>>>>
>>>>>
>>>>> Honestly, I don't see the value of this.  I would be more 
>>>>> favorable
>>>>> to an addition to MathArrays supporting NaN (or infinity) 
>>>>> filtering
>>>>> / transformation.  Several years back we made the decision to 
>>>>> just
>>>>> let NaNs propagate in computations, which basically means that if
>>>>> you feed NaNs to [math] you are going to get NaNs out in results.
>>>>> If we do get into the business of filtering NaNs (or infinities 
>>>>> for
>>>>> that matter), I think it is probably best to do that in 
>>>>> MathArrays
>>>>> or somesuch - i.e., don't decorate APIs with NaN or
>>>>> infinity-handling handling descriptions.  That would be a huge 
>>>>> task
>>>>> and would likely end up bleeding implementation detail into the
>>>>> APIs.  As I said above, NaNStrategy has to exist for rank
>>>>> transformations to be well-defined.  NaN-handling in floating 
>>>>> point
>>>>> computations is defined and as long as we fully document what our
>>>>> algorithms do, I think it is fair enough to "let the NaNs fall 
>>>>> where
>>>>> they may" - which basically means users need to do the filtering 
>>>>> /
>>>>> transformation themselves.
>>>>>
>>>>
>>>> So, do I understand correctly that it's OK to add the "remove" and
>>>> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
>>>> Or does this thread conclude that we don't have a use for this 
>>>> within
>>>> Commons Math?
>>>>
>>>
>>> You raise a good point here.  Personally, I don't see an internal
>>> use and if we are sticking with MathArrays including only things we
>>> have internal use for, I would say no, don't include them.
>>>
>>
> I feel IMO, replace and remove are too generic to be holed in 
> Percentile
> and see that they can have uses similar to copy or equals or any 
> other
> MathArrays function..
> (if not MathArrays; then can DoubleArray be an alternate home for 
> these?).

[That's not the point under discussion. Currently the advantage of 
putting
those utilities there is because we can modify or remove non-public 
code
without impacting users (backward compatiblity).]

The question is primarily whether we want to support those methods (in
the public API) even if they are not used _at all_ within Commons Math
(as would be the case if the answer to the last question below is in
the affirmative).

If the feature is deemed generally useful (to be confirmed by a 
motivated
feature request), the question would be how to best provide it. This 
comes
back to your proposal of a "Transformer" interface.


Regards,
Gilles


>>
>> A third way to look at it would be that since some algorithms 
>> require
>> being passed "clean" data (e.g. without NaNs), we could provide some
>> simple means to do so. A user could then write, e.g.
>> ---
>> double[] data = { 1, Double.NaN, 3, 6, 5 };
>>
>> Percentile p = new Percentile();
>> double q = p.evaluate(MathArrays.replace(data, Double.NaN,
>> Double.POSITIVE_INFINITY), 0.1);
>> ---
>>
>> Then, if we provide that utility, is it still necessary to 
>> complicate the
>> Percentile
>> code with a NaNStrategy parameter?
>>
>>
>> Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message