commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [math] Recent commits to stat, util packages
Date Sun, 06 Jul 2003 05:24:16 GMT

Sorry, last reply got sent before I was done with it.  Pls disregard and try
> > This adds
> > significant overhead and I do not see the value in it.  The cost of the
> > additional stack operations/object creations is significant.  I ran tests
> > comparing the previous version that does direct computations using the
> double[]
> > arrays to the modified version and found an average of more than 6x
> slowdown
> > using the new implementation. I did not profile memory utilization, but
> that is
> > also a concern. Repeated tests computing the mean of a 1000 doubles 100000
> > times using the old and new implementations averaged 1.5 and 10.2 seconds,
> > resp. I do not see the need for all of this additional overhead. 
> > 
> If you review the code, you'll find there is no added "object creation", 
> the static Variable objects calculate on double[] just as the 
> Univariates did, I would have to see more substantial analysis to 
> believe your claim. All thats going on here are that the Static StatUtil 
> methods are delegating to individual static instances of 
> UnivariateStatistics. These are instantiated on JVM startup like all 
> static objects, calling a method in such an object should not require 
> any more overhead than having the method coded directly into the static 
> method.
> If there are performance considerations, lets discuss these.

Here is what I added to StatUtils.test

 double[] x = new double[1000];
 for (int i = 0; i < 1000; i++) {
 	x[i] = (5 - i) * (i - 200);
 long startTick = 0;
 double res = 0;
  for (int j = 0; j < 10; j++) {    
    startTick = System.currentTimeMillis();
    for (int i = 0; i < 100000; i++) {
      res = OStatUtils.mean(x);
    System.out.println("old: " + (System.currentTimeMillis() - startTick));
    startTick = System.currentTimeMillis();
    for (int i = 0; i < 100000; i++) {
      res = StatUtils.mean(x);
    System.out.println("new: " + (System.currentTimeMillis() - startTick));

The result was a mean of 10203 for the "new" and 1531.1 for the "old", with
standard deviations 81.1 and 13.4 resp.  The overhead is the stack operations
and temp object creations.
> I doubt (as the numerous discussions over the past week have pointed 
> out) that what we really want to have in StatUtils is one monolithic 
> Static class with all the implemented methods present in it. If I have 
> misinterpreted this opinion in the group, then I'm sure there will be 
> responses to this.

Well, I for one would prefer to have the simple computational methods in one
place.  I would support making the class require instantiation, however, i.e.
making the methods non-static.

> There was a great deal of discussion about the benefit of not having the 
> methods implemented directly in static StatUtils because they could not 
> be "overridden" or worked with in an Instantiable form. This approach 
> frees the implementations up to be overridden and frees up room for 
> alternate implementations.

As I said above, the simplest way to deal with this is to make the methods

> You may have your opinions of how you would like to see the packages 
> organized and implemented. Others in the group do have alternate 
> opinions to yours. I for one see a strong value in individually 
> implemented Statistics. I also have a strong vision that the framework I 
> have been working on provides substantial benefits.
> (1a.) It Allows both the storageless and storage based implementations 
> to function behind the same interface. No matter if your calling
> increment(double d)
> or
> evaluate(double[]...)
> your working with the same algorithm.

That is true in the old implementation as well, with the core computational
methods in StatUtils.
> (1b.) If you wish to have alternate implementations for evaluate and 
> increment, it is easily possible of overload theses methods in future 
> versions of the implementations.

Just make the methods non-static and that will be possible.  I am not sure,
given the relative triviality of these methods, if this is really a big deal,

> Phil, its clear we have very different "schools of thought" on the 
> subject of how the library should be designed. As a developer on the 
> project I have a right to promote my design model and interests. The 
> architecture is something I have a strong interest in working with.

You certainly have the right to your opinions.  Others also have the right to
disagree with them.
> Apache projects are "group" projects, If a project such a [math] cannot 
> find community and room for multiple directions of development. If it 
> cannot make room for alternate ideas and visions, if both revolutionary 
> and evolutionary processes cannot coexist, I doubt the project will have 
> much of a future at all.

I agree with this as well; but from what I have observed, open source projects
do best when they do not try to go off in divergent directions at the same
time. If we cannot agree on a consistent architecture direction, then I don't
think we will succeed. If we can and we stay focussed, then we will.  As I said
above, if others agree with the approach that you want to take, then that is
the direction that the project will go.  I am interested in the opinions of
Tim, Robert and the rest of the team.

> -Mark
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message