Hi all,
Le 01/07/2014 08:52, Luc Maisonobe a écrit :
> Le 30/06/2014 22:00, Phil Steitz a écrit :
>> On 6/30/14, 12:33 PM, Luc Maisonobe wrote:
>>> Le 30/06/2014 19:15, Phil Steitz a écrit :
>>>>
>>>>> On Jun 30, 2014, at 9:59 AM, Gilles <gilles@harfang.homelinux.org>
>>>>> wrote:
>>>>>
>>>>> On Mon, 30 Jun 2014 09:08:00 0700, Phil Steitz wrote:
>>>>>>> On Jun 29, 2014, at 3:41 PM, Gilles
>>>>>>> <gilles@harfang.homelinux.org> wrote:
>>>>>>>
>>>>>>>>> On Sun, 29 Jun 2014 14:39:51 0700, Phil Steitz wrote:
>>>>>>>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>>>>>>>> On Sun, 29 Jun 2014 10:25:58 0700, Phil Steitz
wrote:
>>>>>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
On Sun,
>>>>>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>>>>>>>>> <phil.steitz@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
Le
>>>>>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a
écrit :
>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM,
Luc Maisonobe
>>>>>>>>>>>>>> <luc@spaceroots.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Hi Venkat,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha
murthy a écrit
>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08
AM, Luc
>>>>>>>>>>>>>>>> Maisonobe <luc@spaceroots.org>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> While looking further
in Percentile class
>>>>>>>>>>>>>>>>> for MATH1120, I have
found another problem
>>>>>>>>>>>>>>>>> in the current implementation.
>>>>>>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> leave the NaNs in place,
but at the end of
>>>>>>>>>>>>>>>>> the KthSelector.select
method, a call to
>>>>>>>>>>>>>>>>> Arrays.sort moves the
NaNs to the end of
>>>>>>>>>>>>>>>>> the small subarray.
What is really
>>>>>>>>>>>>>>>>> implemented here is therefore
closer to
>>>>>>>>>>>>>>>>> NaNStrategy.MAXIMAL than
NaNStrategy.FIXED.
>>>>>>>>>>>>>>>>> This always occur in
the test cases because
>>>>>>>>>>>>>>>>> they use very short arrays,
and we
>>>>>>>>>>>>>>>>> directly
>>>>>>>>>>>> switch to
>>>>>>>>>>>>>>>>> this part of the select
method.
>>>>>>>>>>>>>>>> Are NaNs considered higher
than +Inf ? If
>>>>>>>>>>>>>>>> MAXIMAL represent replacing
for +inf ; you
>>>>>>>>>>>>>>>> need something to indicate
beyond this for
>>>>>>>>>>>>>>>> NaN.
>>>>>>>>>>>>>>> Well, we can just keep the NaN
themselves and
>>>>>>>>>>>>>>> handled them appropriately, hoping
not to trash
>>>>>>>>>>>>>>> performances too much.
>>>>>>>>>>>>>> Agreed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What is the test input you
see an issue and
>>>>>>>>>>>>>>>> what is the actual error
you are seeing.
>>>>>>>>>>>>>>>> Please share the test case.
>>>>>>>>>>>>>>> Just look at
>>>>>>>>>>>>>>> PercentileTest.testReplaceNanInRange().
The
>>>>>>>>>>>>>>> first check in the test corresponds
to a
>>>>>>>>>>>>>>> Percentile configuration at 50%
percentile, and
>>>>>>>>>>>>>>> NaNStrategy.FIXED. The array
has an odd number
>>>>>>>>>>>>>>> of entries, so the 50% percentile
is exactly
>>>>>>>>>>>>>>> one of the entries: the one at
index 5 in the
>>>>>>>>>>>>>>> final array.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The initial ordering is { 0,
1, 2, 3, 4, NaN,
>>>>>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the
>>>>>>>>>>>>>>> NaNStrategy.FIXED setting, it
should not be
>>>>>>>>>>>>>>> modified at all in the selection
algorithm and
>>>>>>>>>>>>>>> the result for 50% should be
the first NaN of
>>>>>>>>>>>>>>> the array, at index 5. In fact,
due to the
>>>>>>>>>>>>>>> Arrays.sort, we *do* reorder
the array into {
>>>>>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN,
NaN, NaN }, so the
>>>>>>>>>>>>>>> result is 5.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Agreed. just verified by putting
back the
>>>>>>>>>>>>>>> earlier insertionSort
>>>>>>>>>>>> function.
>>>>>>>>>>>>>>> If we use NaNStrategy.MAXIMAL
and any quantile
>>>>>>>>>>>>>>> above 67%, we get as a result
>>>>>>>>>>>>>>> Double.POSITIVE_INFINITY instead
of
>>>>>>>>>>>>>>> Double.NaN.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If we agree to leave FIXED as
unchanged
>>>>>>>>>>>>>>> behaviour with your
>>>>>>>>>>>> insertionSort
>>>>>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL
should be
>>>>>>>>>>>>>> allowed for transformation
>>>>>>>>>>>> of
>>>>>>>>>>>>>> NaN to +/Inf
>>>>>>>>>>>>> I'm fine with it, as long as we clearly
state it in
>>>>>>>>>>>>> the NaNStrategy documentation, saying
somethig
>>>>>>>>>>>>> like:
>>>>>>>>>>>>>
>>>>>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs
are
>>>>>>>>>>>>> effecively *replaced* by +/ infinity,
hence the
>>>>>>>>>>>>> results will be computed as if infinities
were
>>>>>>>>>>>>> there in the first place.
>>>>>>>>>>>> 1  this is mixing concerns. NaNStrategy
exists for
>>>>>>>>>>>> one specific purpose  to turn extend the
ordering on
>>>>>>>>>>>> doubles a total ordering. Strategies prescribe
only
>>>>>>>>>>>> how NaNs are to be treated in the ordering.
Side
>>>>>>>>>>>> effects on the input array don't make sense
in this
>>>>>>>>>>>> context. The use case for which this class
was
>>>>>>>>>>>> created was rank transformations. Returning
>>>>>>>>>>>> infinities in rank transformations would
blow up
>>>>>>>>>>>> computations in these classes. If a floating
point
>>>>>>>>>>>> computations need to transform NaNs, the
specific
>>>>>>>>>>>> stats / other classes that do this transformation
>>>>>>>>>>>> should perform and document the behavior.
>>>>>>>>>>>>
>>>>>>>>>>>> Phil
>>>>>>>>>>> OK. Agreed realized it later. Hence i have not
>>>>>>>>>>> touched NaNStrategy in my patch(MATH1132) .
Now i have
>>>>>>>>>>> added a separate transformer to transform NaNs
but it
>>>>>>>>>>> uses NanStrategy. Please let me know on this
as this
>>>>>>>>>>> trasnformation itself can be used in different
places.
>>>>>>>>>> Honestly, I don't see the value of this. I would
be more
>>>>>>>>>> favorable to an addition to MathArrays supporting
NaN (or
>>>>>>>>>> infinity) filtering / transformation. Several years
back
>>>>>>>>>> we made the decision to just let NaNs propagate in
>>>>>>>>>> computations, which basically means that if you feed
NaNs
>>>>>>>>>> to [math] you are going to get NaNs out in results.
If we
>>>>>>>>>> do get into the business of filtering NaNs (or infinities
>>>>>>>>>> for that matter), I think it is probably best to
do that
>>>>>>>>>> in MathArrays or somesuch  i.e., don't decorate
APIs
>>>>>>>>>> with NaN or infinityhandling handling descriptions.
>>>>>>>>>> That would be a huge task and would likely end up
>>>>>>>>>> bleeding implementation detail into the APIs. As
I said
>>>>>>>>>> above, NaNStrategy has to exist for rank transformations
>>>>>>>>>> to be welldefined. NaNhandling in floating point
>>>>>>>>>> computations is defined and as long as we fully document
>>>>>>>>>> what our algorithms do, I think it is fair enough
to "let
>>>>>>>>>> the NaNs fall where they may"  which basically means
>>>>>>>>>> users need to do the filtering / transformation
>>>>>>>>>> themselves.
>>>>>>>>> So, do I understand correctly that it's OK to add the
>>>>>>>>> "remove" and "replace" utility methods (cf. MATH1130)
in
>>>>>>>>> "MathArrays"? Or does this thread conclude that we don't
>>>>>>>>> have a use for this within Commons Math?
>>>>>>>> You raise a good point here. Personally, I don't see an
>>>>>>>> internal use and if we are sticking with MathArrays including
>>>>>>>> only things we have internal use for, I would say no, don't
>>>>>>>> include them.
>>>>>>> A third way to look at it would be that since some algorithms
>>>>>>> require being passed "clean" data (e.g. without NaNs), we could
>>>>>>> provide some simple means to do so. A user could then write,
>>>>>>> e.g.  double[] data = { 1, Double.NaN, 3, 6, 5 };
>>>>>>>
>>>>>>> Percentile p = new Percentile(); double q =
>>>>>>> p.evaluate(MathArrays.replace(data, Double.NaN,
>>>>>>> Double.POSITIVE_INFINITY), 0.1); 
>>>>>>>
>>>>>>> Then, if we provide that utility, is it still necessary to
>>>>>>> complicate the Percentile code with a NaNStrategy parameter?
>>>>>> I may be missing something, but it seems to me that percentile
>>>>>> does need a NaNStrategy in the presence if NaNs like the other
>>>>>> order statistics do (basically to define their position In the
>>>>>> ordering). It doesn't logically need more than that. I would
>>>>>> be ok defaulting the strategy or even returning NaN in the
>>>>>> presence of NaNs (or whenever interpolation involves a NaN). All
>>>>>> of this can be documented.
>>>>> I am _surely_ missing something because I don't figure out how NaNs
>>>>> can be useful when computing those statistics... :}
>>>> Good point. Basically the different strategies allow you to
>>>> effectively ignore or work around them so stats can be welldefined.
>>>> Without NaNStrategy, order statistics including NaNs are undefined.
>>> The problem with Percentile is that the elements are both reordered
>>> *and* used when interpolating between two values. Tor the rank methods,
>>> only reordering was needed so the various NaNStrategies could be
>>> implemented by replacing at will with +/ infinity and a classical sort
>>> would do the ordering for us.
>>>
>>> Here, we do interpolate on the values, so if we first replace NaN with
>>> +infinity to put them at the end, we may interpolate between a finite
>>> value and +inf and have +inf as a result, whereas if the NaN "value" was
>>> preserved and some magic used to sort the NaN, we would end up
>>> interpolating with a NaN and get a NaN as a result. Setting up the magic
>>> for MINIMAL and MAXIMAL seems achievable withing the sort, as we already
>>> have a specific sort (in fact not a sort but a selection algorithm,
>>> which is the core of the class). Setting up the magic for FIXED seems
>>> more difficult.
>>>
>>> So I guess implementing NaNStrategies may be done in different ways
>>> depending on how they will be used by the calling algorithm. I don't yet
>>> know if we can set up a general method to be used for all statistics.
>>> Looking only at Percentile, replacement with +/ infinity already seems
>>> to not be an appropriate solution.
>>
>> I don't think we should try to redefine algorithms so they can
>> handle NaNs. When you do floating point computations with NaNs, you
>> get NaNs. That should be understood to be the contract. For rank
>> statistics, such as Percentile, we need to extend the ordering on
>> doubles to include them in ordering for the algorithm to be
>> welldefined. If we subsequently do floating point computations
>> with the NaNs, the result should be NaN. I think we should either
>> just simplify things and say we *always* return NaN whenever NaNs
>> are present for Percentile, or do the following:
>>
>> 0) use the prescribed NaN strategy to determine the ordering  so
>> NaNs end up either at the bottom, at the top or in their original
>> position (FIXED).
>> 1) if interpolation involves a NaN element, the result returned is NaN.
>
> This is exactly what I say. If NaNs are in, NaNs should be out. As for
> now we do replace them, we get infinities instead of NaNs. I would
> prefer getting NaNs there as it is more consistent with the purpose of
> NaNs in floating point encoding.
>
> So I think we agree: NaNs should not be changed, but they should be
> reordered. Perhaps we could have something simple like NaNStrategy
> having a method:
>
> boolean shouldSwap(int i1, double d1, int i2, double i2)
>
> that would use both the index and values of the element to check if they
> should be swapped or not. This method would be used both for traditional
> sorting for ranking and in selection algorithms for percentile. I first
> thought we should simply have NaN implement Comparator<Double> but this
> would not work for FIXED, whereas the above method would work (simply
> returning false as soon as there is a NaN, regardless of it being at a
> lower or upper index).
>
> I will take a look at this option.
This does not work as is. The problem is lack of transitivity. Suppose
that we have three indices i < j < k, with a[i] > a[k] and a[j] = NaN.
Then if during the sort/selection we compare elements at indices i and k
first, we will swap them, which is correct behaviour regardless of the
NaN strategy used. If we compare elemntes at indices i and j first and
then elements at indices j and k and we use FIXED NaNStrategy, then
we won't swap the two elements and the middle NaN would induced a
nonglobally sorted array. I think FIXED should act as a river flowing
over rocks : the water (nonNaN elements) would move freely around and
be sorted, and the rocks (NaN elements) would stay were they are but do
not prevent water flow.
I have found another way to handle this, using an indirection array that
could "jump" over NaNs for FIXED strategy or push them to either end for
MINIMAL or MAXIMAL strategy.
I will try this in the next few days, as time permit.
best regards,
Luc
>
> best regards,
> Luc
>
>>
>> Phil
>>>
>>> best regards,
>>> Luc
>>>
>>>>
>>>> Phil
>>>>
>>>>>> I still don't see a withinmath use for recoding methods; though
>>>>>> I do see the value from a user perspective. The problem is if we
>>>>>> open this door, we risk MathArrays becoming a mini commons lang.
>>>>> If Commons Lang already provides a feature, then we indeed don't
>>>>> need to offer the same within CM (if there is no internal use). Can
>>>>> someone confirm?
>>>>>
>>>>>
>>>>> Otherwise we could define a new inner interface in MathArrays,
>>>>> similar to "MathArrays.Function":  public class MathArrays {
>>>>>
>>>>> // ...
>>>>>
>>>>> public interface Transformer { /** Transform the whole array */
>>>>> double[] transform(double[] input); /** Transform the array in the
>>>>> interval [from, to) and return the whole array */ double[]
>>>>> transform(double[] input, int from, int to); /** Transform the
>>>>> array in the interval [from, to) and return the transformed
>>>>> subarray */ double[] transformAndSplice(double[] input, int from,
>>>>> int to); } } 
>>>>>
>>>>> Then we can provide "replace" and "remove" transformers through
>>>>> factory methods (example/untested code):  public class
>>>>> MathArrays {
>>>>>
>>>>> // ...
>>>>>
>>>>> public static Transformer createReplaceTransformer(final double
>>>>> old, final double replace) { return new Transformer() { public
>>>>> double[] transform(double[] input) { final int len = input.length;
>>>>> final double[] output = new double[len]; for (int i = 0; i < len;
>>>>> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ?
>>>>> replace : input[i]; } return output; }
>>>>>
>>>>> // etc. }; } } 
>>>>>
>>>>>
>>>>> Gilles
>>>>>
>>>>>
>>>>> 
>>>>>
>>>>>
>>> To unsubscribe, email: devunsubscribe@commons.apache.org
>>>>> For additional commands, email: devhelp@commons.apache.org
>>>>>
>>>> 
>>>>
>>>>
>>> To unsubscribe, email: devunsubscribe@commons.apache.org
>>>> For additional commands, email: devhelp@commons.apache.org
>>>>
>>>>
>>>
>>> 
>>> To unsubscribe, email: devunsubscribe@commons.apache.org
>>> For additional commands, email: devhelp@commons.apache.org
>>>
>>>
>>
>>
>> 
>> To unsubscribe, email: devunsubscribe@commons.apache.org
>> For additional commands, email: devhelp@commons.apache.org
>>
>>
>>
>
>
> 
> To unsubscribe, email: devunsubscribe@commons.apache.org
> For additional commands, email: devhelp@commons.apache.org
>
>
>

To unsubscribe, email: devunsubscribe@commons.apache.org
For additional commands, email: devhelp@commons.apache.org
