Return-Path:
data
.
> @@ -265,8 +324,8 @@ public class SimpleRegression implements
> * @param x input x
value
> * @return predicted y
value
> */
> - public double predict(double x) {
> - double b1 = getSlope();
> + public double predict(final double x) {
> + final double b1 = getSlope();
> if (hasIntercept) {
> return getIntercept(b1) + b1 * x;
> }
> @@ -298,7 +357,7 @@ public class SimpleRegression implements
> *
> * @return true if constant exists, false otherwise
> */
> - public boolean hasIntercept(){
> + public boolean hasIntercept() {
> return hasIntercept;
> }
>
> @@ -572,7 +631,7 @@ public class SimpleRegression implements
> * @return half-width of 95% confidence interval for the slope estimate
> * @throws MathException if the confidence interval can not be computed.
> */
> - public double getSlopeConfidenceInterval(double alpha)
> + public double getSlopeConfidenceInterval(final double alpha)
> throws MathException {
> if (alpha >= 1 || alpha <= 0) {
> throw new OutOfRangeException(LocalizedFormats.SIGNIFICANCE_LEVEL,
> @@ -620,7 +679,7 @@ public class SimpleRegression implements
> * @param slope current slope
> * @return the intercept of the regression line
> */
> - private double getIntercept(double slope){
> + private double getIntercept(final double slope) {
> if( hasIntercept){
> return (sumY - slope * sumX) / n;
> }
> @@ -633,7 +692,134 @@ public class SimpleRegression implements
> * @param slope regression slope estimate
> * @return sum of squared deviations of predicted y values
> */
> - private double getRegressionSumSquares(double slope) {
> + private double getRegressionSumSquares(final double slope) {
> return slope * slope * sumXX;
> }
> +
> + /**
> + * Performs a regression on data present in buffers and outputs a RegressionResults object
> + * @return RegressionResults acts as a container of regression output
> + * @throws ModelSpecificationException if the model is not correctly specified
> + */
> + public RegressionResults regress() throws ModelSpecificationException{
> + if( hasIntercept ){
> + if( n < 3 ){
> + throw new NoDataException( LocalizedFormats.NOT_ENOUGH_DATA_REGRESSION );
> + }
This is slightly misleading ("no data" or "not enough data"?).
> + if( FastMath.abs( sumXX ) > MathUtils.SAFE_MIN ){
> + final double[] params = new double[]{ getIntercept(), getSlope() };
> + final double mse = getMeanSquareError();
> + final double _syy = sumYY + sumY * sumY / ((double) n);
> + final double[] vcv = new double[]{
> + mse * (xbar *xbar /sumXX + 1.0 / ((double) n)),
> + -xbar*mse/sumXX,
> + mse/sumXX };
> + return new RegressionResults(
> + params, new double[][]{vcv}, true, n, 2,
> + sumY, _syy, getSumSquaredErrors(),true,false);
> + }else{
> + final double[] params = new double[]{ sumY/((double) n), Double.NaN };
> + //final double mse = getMeanSquareError();
> + final double[] vcv = new double[]{
> + ybar / ((double) n - 1.0),
> + Double.NaN,
> + Double.NaN };
> + return new RegressionResults(
> + params, new double[][]{vcv}, true, n, 1,
> + sumY, sumYY, getSumSquaredErrors(),true,false);
> + }
> + }else{
> + if( n < 2 ){
> + throw new NoDataException( LocalizedFormats.NOT_ENOUGH_DATA_REGRESSION );
> + }
Same as above.
> + if( !Double.isNaN(sumXX) ){
> + final double[] vcv = new double[]{ getMeanSquareError() / sumXX };
> + final double[] params = new double[]{ sumXY/sumXX };
> + return new RegressionResults(
> + params, new double[][]{vcv}, true, n, 1,
> + sumY, sumYY, getSumSquaredErrors(),false,false);
> + }else{
> + final double[] vcv = new double[]{Double.NaN };
> + final double[] params = new double[]{ Double.NaN };
> + return new RegressionResults(
> + params, new double[][]{vcv}, true, n, 1,
> + Double.NaN, Double.NaN, Double.NaN,false,false);
> + }
> + }
> + }
> +
> + /**
> + * Performs a regression on data present in buffers including only regressors
> + * indexed in variablesToInclude and outputs a RegressionResults object
> + * @param variablesToInclude an array of indices of regressors to include
> + * @return RegressionResults acts as a container of regression output
> + * @throws ModelSpecificationException if the model is not correctly specified
> + * @throws MathIllegalArgumentException if the variablesToInclude array is null or zero length
> + * @throws OutOfRangeException if a requested variable is not present in model
> + */
> + public RegressionResults regress(int[] variablesToInclude) throws ModelSpecificationException{
> + if( variablesToInclude == null || variablesToInclude.length == 0){
> + throw new MathIllegalArgumentException(LocalizedFormats.ARRAY_ZERO_LENGTH_OR_NULL_NOTALLOWED);
> + }
Clumped tests.
> + if( variablesToInclude.length > 2 || (variablesToInclude.length > 1 && !hasIntercept) ){
> + throw new ModelSpecificationException(
> + LocalizedFormats.ARRAY_SIZE_EXCEEDS_MAX_VARIABLES,
> + (variablesToInclude.length > 1 && !hasIntercept) ? 1 : 2);
> + }
REGRESSION
NUMBER_IS_TOO_LARGE
> + if( hasIntercept ){
> + if( variablesToInclude.length == 2 ){
> + if( variablesToInclude[0] == 1 ){
> + throw new ModelSpecificationException(LocalizedFormats.NOT_INCREASING_SEQUENCE);
This one should throw "NonMonotonicSequenceException".
I really don't see the value of clumping completely different failure into
the same "ModelSpecificationException". This renders the exception type
useless.
[The point of having typed exceptionz is to provide the flexibility of
dealing with them programmatically. When the actual causes of failure varies
wildly, this won't be possible.]
> + }else if( variablesToInclude[0] != 0 ){
> + throw new OutOfRangeException( variablesToInclude[0], 0,1 );
> + }
> + if( variablesToInclude[1] != 1){
> + throw new OutOfRangeException( variablesToInclude[0], 0,1 );
> + }
> + return regress();
> + }else{
> + if( variablesToInclude[0] != 1 && variablesToInclude[0] != 0 ){
> + throw new OutOfRangeException( variablesToInclude[0],0,1 );
> + }
> + final double _mean = sumY * sumY / ((double) n);
> + final double _syy = sumYY + _mean;
> + if( variablesToInclude[0] == 0 ){
> + //just the mean
> + final double[] vcv = new double[]{ sumYY/((double)((n-1)*n)) };
> + final double[] params = new double[]{ ybar };
> + return new RegressionResults(
> + params, new double[][]{vcv}, true, n, 1,
> + sumY, _syy+_mean, sumYY,true,false);
> +
> + }else if( variablesToInclude[0] == 1){
> + //final double _syy = sumYY + sumY * sumY / ((double) n);
> + final double _sxx = sumXX + sumX * sumX / ((double) n);
> + final double _sxy = sumXY + sumX * sumY / ((double) n);
> + final double _sse = FastMath.max(0d, _syy - _sxy * _sxy / _sxx);
> + final double _mse = _sse/((double)(n-1));
> + if( !Double.isNaN(_sxx) ){
> + final double[] vcv = new double[]{ _mse / _sxx };
> + final double[] params = new double[]{ _sxy/_sxx };
> + return new RegressionResults(
> + params, new double[][]{vcv}, true, n, 1,
> + sumY, _syy, _sse,false,false);
> + }else{
> + final double[] vcv = new double[]{Double.NaN };
> + final double[] params = new double[]{ Double.NaN };
> + return new RegressionResults(
> + params, new double[][]{vcv}, true, n, 1,
> + Double.NaN, Double.NaN, Double.NaN,false,false);
> + }
> + }
> + }
> + }else{
> + if( variablesToInclude[0] != 0 ){
> + throw new OutOfRangeException(variablesToInclude[0],0,0);
> + }
> + return regress();
> + }
> +
> + return null;
> + }
> }
>
> Modified: commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/UpdatingMultipleLinearRegression.java
> URL: http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/UpdatingMultipleLinearRegression.java?rev=1174509&r1=1174508&r2=1174509&view=diff
> ==============================================================================
> --- commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/UpdatingMultipleLinearRegression.java (original)
> +++ commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/UpdatingMultipleLinearRegression.java Fri Sep 23 03:36:11 2011
> @@ -61,7 +61,7 @@ public interface UpdatingMultipleLinearR
> * @throws ModelSpecificationException if {@code x} is not rectangular, does not match
> * the length of {@code y} or does not contain sufficient data to estimate the model
> */
> - void addObservations(double[][] x, double[] y);
> + void addObservations(double[][] x, double[] y) throws ModelSpecificationException;
This is exactly what I say just above: a single exception for totally
unrelated failures.
[And it is yet another example of mixing (i.e. confusing) library and
application concerns: The library must check the preconditions that are
needed to perform its work, and throw an exception that it commensurate with
the failed test. Then, the application layer (the caller) can possibly
translate this into a "business" language. At the level of CM, we lack the
full context in order to be sure that one fixed high-level message wil be
adequate in all circumstances. As I've written numerous times also: When
designing code for CM, we must strive to separate the roles of "CM user" and
"CM developer".]
> /**
> * Clears internal buffers and resets the regression model. This means all
>
> [...]
Also, please be careful of the code formatting style. This commit contains
many violations of the following rules:
* There should be a space character before a "{" character.
* There should be a space character on both sides of a keyword.
* There should be a space on both sides of an operator.
* There should not be a space after an opening parenthesis.
* There should not be a space before a closing parenthesis.
* There should not be a space before a ";" character.
* Variables (instance or local) should not contain a "_" character.
* Indents should be (muliples of) 4 space characters wide.
Thanks and best regards,
Gilles
[1] There is an unsettled issue about whether the CM code should check for
"null" (and throw "NullArgumentException") or let the JVM throw the
standard NPE.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org