commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kim van der Linde <...@kimvdlinde.com>
Subject [MATH] Summary proposed changes (was: Matrix Indices)
Date Mon, 30 Aug 2004 01:31:52 GMT
Well, I had a discussion with several collegues (type science users, we 
went snorkeling) on several of these issues. The score of the day was 
the idea that the simple linear LS regression was considered a 
multvariate statistics.

Phil Steitz wrote:
> Nothing has been "put aside."  We make decisions by consensus.  You have 
> provided input and we are considering it.  To make sure I have it all 
> right, you have proposed four changes:
> 
> 1) Change the RealMatrix getEntry, getRow, getColumn methods to use 
> 0-based indexing.

Make it consistent with the underlying indexing to avoid confusion, 
programming overload, and CPU usage effectivety.

> 2) Change the name of "BivariateRegression" to "UnivariateRegression" 
> (or something else)

Put it in univariate, name it LSRegression. (or better, 
SimpleRegression, and bild in the option for RMA and MA regressions).

> 3) Change Variance to be configurable to generate the population statistic.

Yup, or even beter, configurable bias reduction (n = N-a default a = 1, 
but settable by constuctor and specific methods to mantain the option of 
getting both statistics from the same dataset without doing things 
twice). The current situation actually introduces fundamental errors. 
 From the JavaDoc for Variance and SD class:

- double evaluate(double[] values, double mean, int begin, int length)
     Returns the variance of the entries in the specified portion of the 
input array, using the precomputed mean value.

And in Variance only:
- double evaluate(double[] values, double mean)
     Returns the variance of the entries in the input array, using the 
precomputed mean value.

If you compute the variance based on a already existing mean obtained 
different from the sample you estblish the variance on, the population 
variance should be used as there is no loss of "degree's of freedom" by 
  first establishing the mean of the sample. IF the mean is based in the 
same sample, than it is correct.

> 4) Combine the univariate and multivariate packages, since it is 
> confusing to separate statistics that focus on one variable and 
> sometimes the word "univariate" is used in the context of multivariate 
> techniques (e.g. "Univariate Anova").

No, keep them separate, but just locate things where they belong and not 
reinvent that simple LS regressions should be within the multivariate 
package.

I have question for you. Where would you locate a Covariance class....?

Kim



-- 
http://www.kimvdlinde.com


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message