commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [math] use case for NaNStrategy.FIXED?
Date Sun, 29 Jun 2014 21:39:51 GMT
On 6/29/14, 2:30 PM, Gilles wrote:
> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>> <phil.steitz@gmail.com> wrote:
>>>
>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>> <luc@spaceroots.org>
>>>> wrote:
>>>>>>> Hi Venkat,
>>>>>>>
>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>> <luc@spaceroots.org>
>>>>>>> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> While looking further in Percentile class for MATH-1120,
I
>>>>>>>>> have found
>>>>>>>>> another problem in the current implementation.
>>>>>>>>> NaNStrategy.FIXED
>>>> should
>>>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>> KthSelector.select
>>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end
of
>>>>>>>>> the small
>>>>>>>>> sub-array. What is really implemented here is therefore
>>>>>>>>> closer to
>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always
>>>>>>>>> occur in the
>>>>>>>>> test cases because they use very short arrays, and we
>>>>>>>>> directly
>>>> switch to
>>>>>>>>> this part of the select method.
>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>> If MAXIMAL represent replacing for +inf ; you need
>>>>>>>> something to
>>>>>>>> indicate beyond this for NaN.
>>>>>>> Well, we can just keep the NaN themselves and handled them
>>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>>
>>>>>> Agreed.
>>>>>>
>>>>>>>> What is the test input you see an issue and what is the
>>>>>>>> actual error
>>>>>>>> you are seeing. Please share the test case.
>>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The
>>>>>>> first check in
>>>>>>> the test corresponds to a Percentile configuration at 50%
>>>>>>> percentile,
>>>>>>> and NaNStrategy.FIXED. The array has an odd number of
>>>>>>> entries, so the
>>>>>>> 50% percentile is exactly one of the entries: the one at
>>>>>>> index 5 in the
>>>>>>> final array.
>>>>>>>
>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7,
>>>>>>> NaN, 8 }. So
>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified
>>>>>>> at all in
>>>>>>> the selection algorithm and the result for 50% should be the
>>>>>>> first NaN
>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>> we *do*
>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN,
>>>>>>> NaN }, so
>>>>>>> the result is 5.
>>>>>>>
>>>>>>> Agreed. just verified by putting back the earlier insertionSort
>>>> function.
>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we
>>>>>>> get as a
>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>
>>>>>>> If we agree to leave FIXED as unchanged behaviour with your
>>>> insertionSort
>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>> transformation
>>>> of
>>>>>> NaN to +/-Inf
>>>>> I'm fine with it, as long as we clearly state it in the
>>>>> NaNStrategy
>>>>> documentation, saying somethig like:
>>>>>
>>>>>  When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>> *replaced* by
>>>>>  +/- infinity, hence the results will be computed as if
>>>>> infinities were
>>>>>  there in the first place.
>>>> -1 - this is mixing concerns.  NaNStrategy exists for one specific
>>>> purpose - to turn extend the ordering on doubles a total ordering.
>>>> Strategies prescribe only how NaNs are to be treated in the
>>>> ordering.  Side effects on the input array don't make sense in
>>>> this
>>>> context.  The use case for which this class was created was rank
>>>> transformations.  Returning infinities in rank transformations
>>>> would
>>>> blow up computations in these classes.  If a floating point
>>>> computations need to transform NaNs, the specific stats / other
>>>> classes that do this transformation should perform and document
>>>> the
>>>> behavior.
>>>>
>>>> Phil
>>>>
>>> OK. Agreed realized it later.  Hence i have not touched
>>> NaNStrategy in my
>>> patch(MATH-1132) . Now i have added a separate transformer to
>>> transform NaNs
>>> but it uses NanStrategy.  Please let me know on this as this
>>> trasnformation
>>> itself
>>> can be used in different places.
>>
>> Honestly, I don't see the value of this.  I would be more favorable
>> to an addition to MathArrays supporting NaN (or infinity) filtering
>> / transformation.  Several years back we made the decision to just
>> let NaNs propagate in computations, which basically means that if
>> you feed NaNs to [math] you are going to get NaNs out in results.
>> If we do get into the business of filtering NaNs (or infinities for
>> that matter), I think it is probably best to do that in MathArrays
>> or somesuch - i.e., don't decorate APIs with NaN or
>> infinity-handling handling descriptions.  That would be a huge task
>> and would likely end up bleeding implementation detail into the
>> APIs.  As I said above, NaNStrategy has to exist for rank
>> transformations to be well-defined.  NaN-handling in floating point
>> computations is defined and as long as we fully document what our
>> algorithms do, I think it is fair enough to "let the NaNs fall where
>> they may" - which basically means users need to do the filtering /
>> transformation themselves.
>
> So, do I understand correctly that it's OK to add the "remove" and
> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
> Or does this thread conclude that we don't have a use for this within
> Commons Math?

You raise a good point here.  Personally, I don't see an internal
use and if we are sticking with MathArrays including only things we
have internal use for, I would say no, don't include them. 

Phil
>
> Gilles
>
>>
>> Phil
>>>
>>>>> [...]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message