commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [math] use case for NaNStrategy.FIXED?
Date Mon, 30 Jun 2014 16:08:00 GMT


> On Jun 29, 2014, at 3:41 PM, Gilles <gilles@harfang.homelinux.org> wrote:
> 
>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>>> <phil.steitz@gmail.com> wrote:
>>>>> 
>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>>>> <luc@spaceroots.org>
>>>>>> wrote:
>>>>>>>>> Hi Venkat,
>>>>>>>>> 
>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>>>> <luc@spaceroots.org>
>>>>>>>>> wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> While looking further in Percentile class for
MATH-1120, I
>>>>>>>>>>> have found
>>>>>>>>>>> another problem in the current implementation.
>>>>>>>>>>> NaNStrategy.FIXED
>>>>>> should
>>>>>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>>>> KthSelector.select
>>>>>>>>>>> method, a call to Arrays.sort moves the NaNs
to the end of
>>>>>>>>>>> the small
>>>>>>>>>>> sub-array. What is really implemented here is
therefore
>>>>>>>>>>> closer to
>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This
always
>>>>>>>>>>> occur in the
>>>>>>>>>>> test cases because they use very short arrays,
and we
>>>>>>>>>>> directly
>>>>>> switch to
>>>>>>>>>>> this part of the select method.
>>>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>>>> If MAXIMAL represent replacing for +inf ; you need
>>>>>>>>>> something to
>>>>>>>>>> indicate beyond this for NaN.
>>>>>>>>> Well, we can just keep the NaN themselves and handled
them
>>>>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>>> Agreed.
>>>>>>>> 
>>>>>>>>>> What is the test input you see an issue and what
is the
>>>>>>>>>> actual error
>>>>>>>>>> you are seeing. Please share the test case.
>>>>>>>>> Just look at PercentileTest.testReplaceNanInRange().
The
>>>>>>>>> first check in
>>>>>>>>> the test corresponds to a Percentile configuration at
50%
>>>>>>>>> percentile,
>>>>>>>>> and NaNStrategy.FIXED. The array has an odd number of
>>>>>>>>> entries, so the
>>>>>>>>> 50% percentile is exactly one of the entries: the one
at
>>>>>>>>> index 5 in the
>>>>>>>>> final array.
>>>>>>>>> 
>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5,
7,
>>>>>>>>> NaN, 8 }. So
>>>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified
>>>>>>>>> at all in
>>>>>>>>> the selection algorithm and the result for 50% should
be the
>>>>>>>>> first NaN
>>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>>>> we *do*
>>>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN,
NaN,
>>>>>>>>> NaN }, so
>>>>>>>>> the result is 5.
>>>>>>>>> 
>>>>>>>>> Agreed. just verified by putting back the earlier insertionSort
>>>>>> function.
>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above
67%, we
>>>>>>>>> get as a
>>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>>> 
>>>>>>>>> If we agree to leave FIXED as unchanged behaviour with
your
>>>>>> insertionSort
>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>>>> transformation
>>>>>> of
>>>>>>>> NaN to +/-Inf
>>>>>>> I'm fine with it, as long as we clearly state it in the
>>>>>>> NaNStrategy
>>>>>>> documentation, saying somethig like:
>>>>>>> 
>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>>>> *replaced* by
>>>>>>> +/- infinity, hence the results will be computed as if
>>>>>>> infinities were
>>>>>>> there in the first place.
>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for one specific
>>>>>> purpose - to turn extend the ordering on doubles a total ordering.
>>>>>> Strategies prescribe only how NaNs are to be treated in the
>>>>>> ordering.  Side effects on the input array don't make sense in
>>>>>> this
>>>>>> context.  The use case for which this class was created was rank
>>>>>> transformations.  Returning infinities in rank transformations
>>>>>> would
>>>>>> blow up computations in these classes.  If a floating point
>>>>>> computations need to transform NaNs, the specific stats / other
>>>>>> classes that do this transformation should perform and document
>>>>>> the
>>>>>> behavior.
>>>>>> 
>>>>>> Phil
>>>>> OK. Agreed realized it later.  Hence i have not touched
>>>>> NaNStrategy in my
>>>>> patch(MATH-1132) . Now i have added a separate transformer to
>>>>> transform NaNs
>>>>> but it uses NanStrategy.  Please let me know on this as this
>>>>> trasnformation
>>>>> itself
>>>>> can be used in different places.
>>>> 
>>>> Honestly, I don't see the value of this.  I would be more favorable
>>>> to an addition to MathArrays supporting NaN (or infinity) filtering
>>>> / transformation.  Several years back we made the decision to just
>>>> let NaNs propagate in computations, which basically means that if
>>>> you feed NaNs to [math] you are going to get NaNs out in results.
>>>> If we do get into the business of filtering NaNs (or infinities for
>>>> that matter), I think it is probably best to do that in MathArrays
>>>> or somesuch - i.e., don't decorate APIs with NaN or
>>>> infinity-handling handling descriptions.  That would be a huge task
>>>> and would likely end up bleeding implementation detail into the
>>>> APIs.  As I said above, NaNStrategy has to exist for rank
>>>> transformations to be well-defined.  NaN-handling in floating point
>>>> computations is defined and as long as we fully document what our
>>>> algorithms do, I think it is fair enough to "let the NaNs fall where
>>>> they may" - which basically means users need to do the filtering /
>>>> transformation themselves.
>>> 
>>> So, do I understand correctly that it's OK to add the "remove" and
>>> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
>>> Or does this thread conclude that we don't have a use for this within
>>> Commons Math?
>> 
>> You raise a good point here.  Personally, I don't see an internal
>> use and if we are sticking with MathArrays including only things we
>> have internal use for, I would say no, don't include them.
> 
> A third way to look at it would be that since some algorithms require
> being passed "clean" data (e.g. without NaNs), we could provide some
> simple means to do so. A user could then write, e.g.
> ---
> double[] data = { 1, Double.NaN, 3, 6, 5 };
> 
> Percentile p = new Percentile();
> double q = p.evaluate(MathArrays.replace(data, Double.NaN, Double.POSITIVE_INFINITY),
0.1);
> ---
> 
> Then, if we provide that utility, is it still necessary to complicate the Percentile
> code with a NaNStrategy parameter?

I may be missing something, but it seems to me that percentile does need a NaNStrategy in
the presence if NaNs like the other order statistics do (basically to define their position
In 
the ordering).  It doesn't logically need more than that.  I would be ok defaulting the strategy
or even returning NaN in the presence of NaNs (or whenever interpolation involves a NaN).
All of this can be documented.

I still don't see a within-math use for recoding methods; though I do see the value from a
user perspective.  The problem is if we open this door, we risk MathArrays becoming a mini
commons lang.

Phil
> 
> 
> Gilles
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message