commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: Why not BigDecimal?
Date Sat, 13 Feb 2010 17:21:22 GMT
Something Something wrote:
> Okay... Let's not worry about R, BigDecimal & precision for time being.  I
> might have been looking at wrong values.  So let's hold that thought.
> 
> Let's take a simple example for getting Y-Hat values using Multiple
> Regression given in this PDF:
> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
> 
> I created a small CSV called, students.csv that contains the following data:
> 
> s1 14 4 1
> s2 23 4 2
> s3 30 7 2
> s4 50 7 4
> s5 39 10 3
> s6 67 10 6
> 
> Col headers:  Student id, Memory span(Y), age(X1), speech rate(X2)
> 
> Now the expected results are:
> 
> yHat[0]:15.166666666666668
> yHat[1]:24.666666666666668
> yHat[2]:27.666666666666664
> yHat[3]:46.666666666666664
> yHat[4]:40.166666666666664
> yHat[5]:68.66666666666667
> 
> This is based on the following equation (given in the PDF):  Y = 1.67 + X1 +
> 9.50 X2
> 
> I wrote the following small quick and dirty code to
> use OLSMultipleLinearRegression.  The 'calculateHat()' method returns a
> RealMatrix, but I can't see the above results in there.  Am I using this
> class correctly?  Please let me know.  Thanks.

The "hat matrix," as defined in the javadoc for calculateHat, is not
the same as the vector of yHat values.  See the javadoc and the
references that it contains for the definition of the hat matrix.

To compute predicted values, you need to post-multiply the design
matrix, X, by the estimated coefficients. Using the variable
definitions below, this is

RealVector b = regression.calculateBeta();
RealVector yHat = X.operate(b);

Side note: the residuals, Y - Y-hat, are available directly via
estimateResiduals; but to get predicted values directly you need to
compute them from the coeffients and design matrix as above.  A
computePredictedValues method added to
AbsractMultipleLinearRegression might be a good enhancement, as well
as a predict(RealVector) method similar to what SimpleRegression
has. Patches welcome!


Phil
> 
> 
> 
> private static void regression1() {
> double[][] X = new double[6][2];
> double[] Y = new double[6];
> try {
> File file = new File("C:\\students.csv");
> FileReader reader = new FileReader(file);
> BufferedReader in = new BufferedReader(reader);
> String line;
>  int count = 0;
>         while ((line = in.readLine()) != null) {
> //        System.out.println(line);
>         Scanner scanner = new Scanner(line);
>         scanner.useDelimiter(" ");
>         String[] cols = new String[4];
>         int col = 0;
>         while (scanner.hasNext()) {
>             cols[col++] = scanner.next();
>         }
>             Y[count] = Double.valueOf(cols[1]);
>             X[count] [0] = Double.valueOf(cols[2]);
>             X[count] [1] = Double.valueOf(cols[3]);
>             count++;
>          }
>          in.close();
>          reader.close();
>        } catch (IOException e) {
>          e.printStackTrace();
>        }
>        OLSMultipleLinearRegression regression = new
> OLSMultipleLinearRegression();
>        regression.newSampleData(Y, X);
>        RealMatrix matrix = regression.calculateHat();
>        System.out.println("matrix:" + matrix.getColumnDimension());
> }
> 
> 
> On Fri, Feb 12, 2010 at 12:08 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> 
>> It is not a precision issue.  R and commons-math use different algorithms
>> with the same underlying numerical implementation.
>>
>> It is even an open question which result is better.  R has lots of
>> credibility, but I have found cases where it lacked precision (and I coded
>> up a patch that was accepted).
>>
>> Unbounded precision integers and rationals are very useful, but not usually
>> for large scale numerical programming.  Except in a very few cases, if you
>> need more than 17 digits of precision, you have other very serious problems
>> that precision won't help.
>>
>> On Fri, Feb 12, 2010 at 1:40 AM, Andy Turner <A.G.D.Turner@leeds.ac.uk
>>> wrote:
>>> Interesting that this is a precision issue. I'm not surprised depending
>> on
>>> what you are doing, double precision may not be enough. It depends a lot
>> on
>>> how the calculations are broken into smaller parts. BigDecimal is
>>> fantastically useful...
>>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message