commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [math] Recent commits to stat, util packages
Date Sun, 06 Jul 2003 04:55:38 GMT

--- "Mark R. Diggory" <> wrote:
> > This adds
> > significant overhead and I do not see the value in it.  The cost of the
> > additional stack operations/object creations is significant.  I ran tests
> > comparing the previous version that does direct computations using the
> double[]
> > arrays to the modified version and found an average of more than 6x
> slowdown
> > using the new implementation. I did not profile memory utilization, but
> that is
> > also a concern. Repeated tests computing the mean of a 1000 doubles 100000
> > times using the old and new implementations averaged 1.5 and 10.2 seconds,
> > resp. I do not see the need for all of this additional overhead. 
> > 
> If you review the code, you'll find there is no added "object creation", 
> the static Variable objects calculate on double[] just as the 
> Univariates did, I would have to see more substantial analysis to 
> believe your claim. All thats going on here are that the Static StatUtil 
> methods are delegating to individual static instances of 
> UnivariateStatistics. These are instantiated on JVM startup like all 
> static objects, calling a method in such an object should not require 
> any more overhead than having the method coded directly into the static 
> method.

Here is what I added to one of the methods in StatUtilsTest, after copying and
renaming the old version OStatUtils:

for (int j = 0; j < 10; j++) {
 startTick = System.currentTimeMillis();
 for (int i = 0; i < 100000; i++) {
   res = OStatUtils.mean(x);
   System.out.println("old: " + (System.currentTimeMillis() - startTick));
   startTick = System.currentTimeMillis();
   for (int i = 0; i < 100000; i++) {
   res = StatUtils.mean(x);
            //newStats.addValue(System.currentTimeMillis() - startTick); 
            System.out.println("new: " + (System.currentTimeMillis() -
        }for (int j = 0; j < 10; j++) {
            startTick = System.currentTimeMillis();
            for (int i = 0; i < 100000; i++) {
                res = OStatUtils.mean(x);
            System.out.println("old: " + (System.currentTimeMillis() -
            //oldStats.addValue(System.currentTimeMillis() - startTick);
            startTick = System.currentTimeMillis();
            for (int i = 0; i < 100000; i++) {
                res = StatUtils.mean(x);
            //newStats.addValue(System.currentTimeMillis() - startTick); 
            System.out.println("new: " + (System.currentTimeMillis() -
> If there are performance considerations, lets discuss these.
> I doubt (as the numerous discussions over the past week have pointed 
> out) that what we really want to have in StatUtils is one monolithic 
> Static class with all the implemented methods present in it. If I have 
> misinterpreted this opinion in the group, then I'm sure there will be 
> responses to this.
> > I suggest that we postpone introduction of a statistical computation
> framework
> > until after the initial release, if needed.  In any case, I would like to
> keep
> > StatUtils and the core UnivariateImpl small, fast and lightweight, so I
> would
> > like to request that the changes to these classes be rolled back.
> > 
> I would really like to see an architecture thats more than just on flat 
> static class with a bunch of double[] methods in it. this is not very 
> useful to me.
> > If others feel that this additional infrastructure is essential, then I
> just
> > need to be educated.  It is quite possible that I am thinking too narrowly
> in
> > terms of current scope and I may be missing some looming structural
> problems. 
> > If this is the case, I am open to being educated. I just need to see a)
> exactly
> > why we need to add more complexity at this time and b) why breaking
> univariate
> > statistics into four packages and 17 classes when all we are computing is
> basic
> > statistics is necessary.  
> > 
> The packages are categorical, the classes are implementations of each 
> statistic. The framework provides an intuitive and organized means for 
> others to easily implement and add statistics to the packages without 
> being restricted to a fascist and monolithic Univariate interface or 
> static StatUtils interface.
> If anything the continued conflict between our two schools of thought 
> shows the necessity of such an approach. Your school of thought can 
> retain the monolithic Interfaces for "Univariate" and "StatUtil". While 
> the framework can provide others with the ability to extend and expand 
> the library without such "heavy handed" restrictions that cripple the 
> extendability of the project.
> There was a great deal of discussion about the benefit of not having the 
> methods implemented directly in static StatUtils because they could not 
> be "overridden" or worked with in an Instantiable form. This approach 
> frees the implementations up to be overridden and frees up room for 
> alternate implementations.
> You may have your opinions of how you would like to see the packages 
> organized and implemented. Others in the group do have alternate 
> opinions to yours. I for one see a strong value in individually 
> implemented Statistics. I also have a strong vision that the framework I 
> have been working on provides substantial benefits.
> (1a.) It Allows both the storageless and storage based implementations 
> to function behind the same interface. No matter if your calling
> increment(double d)
> or
> evaluate(double[]...)
> your working with the same algorithm.
> (1b.) If you wish to have alternate implementations for evaluate and 
> increment, it is easily possible of overload theses methods in future 
> versions of the implementations.
> (2.) With individual Implementations, alternate approaches can be coded 
> and included for the benefit of those who have an interest in such 
> implementations. Thus there could be multiple versions of Variance, 
> based on the strategy of interest and the numerical accuracy required.
> (3.) Having the same implementations of statistics usable across all 
> Univariate implementations assures a standard behavior and the same 
> expected results no matter if your using incremental or evaluation based 
> approaches.
> (4.) The frame work provides a formal structure for the future growth of 
> the library. Knowing what a UnviariateStatistic is, and seeing the 
> various implementations, its obvious the route one will take to 
> implement future statistics of interest.
> Phil, its clear we have very different "schools of thought" on the 
> subject of how the library should be designed. As a developer on the 
> project I have a right to promote my design model and interests. The 
> architecture is something I have a strong interest in working with.
> Apache projects are "group" projects, If a project such a [math] cannot 
> find community and room for multiple directions of development. If it 
> cannot make room for alternate ideas and visions, if both revolutionary 
> and evolutionary processes cannot coexist, I doubt the project will have 
> much of a future at all.
> -Mark
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message