commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [math] statistics performance boost
Date Sun, 06 Jun 2004 22:55:01 GMT
Ken Geis wrote:
> As I explained, I am using commons-math to enable data mining algorithms 
> I am writing.  I am using a lot of SummaryStatistics and TTest.  Through 
> some profiling, I was able to find places to optimize code and I ended 
> up getting a 15x performance boost within my application.  This was from 
> three changes:
> 1. Add clone() to SummaryStatisticsImpl.  This implies adding clone() to 
> SecondMoment, Sum, SumOfSquares, Min, Max, SumOfLogs, GeometricMean, 
> Mean, and Variance.  To Mark, I think that the behavior of clone() is 
> well implied by the Javadoc for java.lang.Object.  I was surprised that 
> I obviously had not read that before yesterday.  To Phil, your suggested 
> getSummary() method/bean would indeed solve my problem and give me even 
> better performance.  (clone() was ~20x faster than the 
> serialize/deserialize hack I was using.  This probably accounts for 2x 
> of my overall 15x.)

As noted in previous response, getSummary(), StatisticalSummaryValues have 
been added.
> 2. Change TTestImpl; the commons-discovery DiscoverClass.newInstance() 
> was being called for every call to tTest.  This is not a cheap method. 
> After #1, this method was taking up something like 17% of the runtime of 
> my synthetic benchmark.  I created a method to lazily get the 
> DistributionFactory and store it (transient) as a class attribute.

TTestImpl now caches the factory (as instance, not class variable).

> 3. Make ContinuedFraction.evaluate(...) iterative instead of recursive. 
>  This gave me a 125% (2.25x) improvement in performance of this method. 
>  I think I can optimize it further, hopefully not at the cost of 
> readability.

We could really use this, as it would also prevent stack overflows (could 
be cause of BZ #29414).  A patch would be most welcome :-)

> Patches available on request.  Should I just start posting them when I 
> have patches like this?
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message