commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <p...@steitz.com>
Subject Re: [Math] Cloning statistics
Date Wed, 12 May 2004 04:15:47 GMT
Ken Geis wrote:
> I'm playing with commons-math to implement a data mining algorithm and I 
> am having a performance problem.
> 
> I am doing running statistics over an ordered set of data, storing the 
> statistics at each new value I come across.  One way of doing this would 
> be to have an array of SummaryStatistics and do
> 
> for (int i = 0; i < length; i++)
> {
>     for (int j = i; j < length; j++)
>     {
>         statsArray[j].addValue(values[i]);
>     }
> }
> 
> another way is to do
> 
> for (int i = 0; i < length; i++)
> {
>     stats.addValue(values[i]);
>     statsArray[i] = SerializationUtils.clone(stats);
> }
> 
> A lot of these objects are marked Serializable, but clone methods do not 
> exist.  That's why I use commons-lang SerializationUtils. Unfortunately, 
> that makes the cloning take up 50% of my runtime because 
> (de)serialization is expensive.
> 
> I will probably patch the statistics classes, implementing enough of 
> clone() to make me happy.  Would you like this patch?

An efficient cloning method might be useful, but it would still carry 
around extra baggage and overhead for your use case (SummaryStatisticsImpl 
   nests a bunch of little stats objects and other instance data).

What we might want to do is add a StatisticalSummaryBean, implementing the 
StatisticalSummary interface and add a getSummary method to 
SummaryStatistics returning an instance of this "value bean" containing 
only the values of the statistics.  Then you could just do

for (int i = 0; i < length; i++)
  {
      stats.addValue(values[i]);
      statsArray[i] = stats.getSummary();
  }

since I presume that all you will want to do with statsArray[i] is things 
like getMean(), getVariance(), etc.  This would require much less overhead 
than cloning the whole SummaryStatisticsImpl instance each time.

Since this would amount to a change to the SummaryStatistics interface, if 
we want to do it, we should do it now, before 1.0.  I am +1 to this change 
and willing to implement it if no one objects.

Phil

> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message