commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles Sadowski <>
Subject Re: [math] speeding up percentile based statistics
Date Sun, 26 Sep 2010 17:19:24 GMT
> [...]
>  1) do nothing to check the array is the same between calls and blindly
>     assumes it IS the same. Users would really need to call clearCache
>     when they provide a new array
>     pros: very simple
>     cons: error-prone for the user as it relies only on reading and
>     understanding a documentation that would change with new version
>  2) check only array reference equality (with ==) to check the array
>     is the same as the previous call or not. Users would need to call
>     clearCache only if they use the same array but have changed its
>     content.
>     pros: trade-off between efficiency and reliability,
>           handles most cases properly
>     cons: may be wrong in corner cases
>  3) check array content using an hash code. Users would generally don't
>     need to call clearCache at all so it would not be provided
>     pros: works in all cases
>     cons: add linear cost for array checking
>  4) remove the double[] values parameter from the API and use a separate
>     addValues method, hence the class will reset its cache only when
>     addValues is called
>     pros: works in all cases
>     cons: Percentile could not implement UnivariateStatistic anymore
> My preference is choice 2.
> What do you think ?

IIUC, the interface method ("evaluate") was designed around the assumption
that successive calls are independent: the array argument is not
encapsulated for further processing (e.g. caching).
IMO, the equality check and the "clearCache" method look like workarounds,
not a clean solution.
As an example, what if a user calls "evaluate" several times alternating on
two different arrays:

    double[] a = new double[] {1, 2, 3};
    double[] b = new double[] {1, 2, 3, 4, 5};
    Percentile p = new Percentile();

    dourble r;
    r = p.evaluate(a, 50);
    r = p.evaluate(b, 50);
    r = p.evaluate(a, 50);
    r = p.evaluate(b, 50);

    // etc.

Doesn't this kind of use-case nullify the expected optimization?

If the array is going to be reused, the user call should reflect that fact;
e.g. by passing the array in a constructor:

    double[] a = new double[] {1, 2, 3};
    double[] b = new double[] {1, 2, 3, 4, 5};
    Percentile pA = new Percentile(a);
    Percentile pB = new Percentile(b);

    double r;
    r = pA.evaluate(50);
    r = pB.evaluate(50);
    r = pA.evaluate(50);
    r = pB.evaluate(50);

That way, later calls can benefit from whatever preprocessing was done in
previous calls.
The instance will always control all the information needed (e.g. after a
call to an "addValues" method) for the processing without the need to rely
on the user for calling "clearCache" whenever necessary.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message