MATH114 and MATH138 propose support for correlation matrices. I have
been working on these and would like to propose the following:
Create a new package o.a.c.m.stat.correlation to house intially
a) Covariance  creates variancecovariance matrix from a matrix
whose columns represent covariates. Also includes convenience methods
that work pairwise on double[] arrays (similar to VectorialCovariance,
but requiring that the arrays be stored)
b) PearsonCorrelation  creates Pearson's productmoment correlation
matrix from either a covariance matrix or a matrix of covariates. Also
includes methods to return matrices of correlation standard errors and
pvalues (aka significances, i.e. pvalue for null hypothesis that the
coefficient is 0).
c) SpearmanRankCorrelation  like Pearson's but no covariance matrix
constructor and using rank correlation.
To implement c), we need a place for the RankingAlgorithm interface and
implementations (see MATH138). Any suggestions on where to put
these? Leaving in correlation may be awkward later on as we do more
with rank transformations.
I have a) implemented using a fairly stable twopass algorithm. I tried
just using VectorialCovariance, but could not get the accuracy I wanted
using the onepass algorithm there. We should probably at some point
look at improving the updating formula used there along the lines of
what we do for Variance, but it is a nice feature of that class that it
does not require the input vectors to be stored and I would not want to
see that changed. For b), similar to the patch in JIRA, I would use
the R computation from SimpleRegression if working from a matrix, or
just compute column sigmas and scale directly if working from a
covariance matrix.
Does this sound good?
If I don't hear any objections, I will commit some code along the lines
above for us to look at.
Phil

To unsubscribe, email: devunsubscribe@commons.apache.org
For additional commands, email: devhelp@commons.apache.org
