Okay... Let's not worry about R, BigDecimal & precision for time being. I
might have been looking at wrong values. So let's hold that thought.
Let's take a simple example for getting Y-Hat values using Multiple
Regression given in this PDF:
http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
I created a small CSV called, students.csv that contains the following data:
s1 14 4 1
s2 23 4 2
s3 30 7 2
s4 50 7 4
s5 39 10 3
s6 67 10 6
Col headers: Student id, Memory span(Y), age(X1), speech rate(X2)
Now the expected results are:
yHat[0]:15.166666666666668
yHat[1]:24.666666666666668
yHat[2]:27.666666666666664
yHat[3]:46.666666666666664
yHat[4]:40.166666666666664
yHat[5]:68.66666666666667
This is based on the following equation (given in the PDF): Y = 1.67 + X1 +
9.50 X2
I wrote the following small quick and dirty code to
use OLSMultipleLinearRegression. The 'calculateHat()' method returns a
RealMatrix, but I can't see the above results in there. Am I using this
class correctly? Please let me know. Thanks.
private static void regression1() {
double[][] X = new double[6][2];
double[] Y = new double[6];
try {
File file = new File("C:\\students.csv");
FileReader reader = new FileReader(file);
BufferedReader in = new BufferedReader(reader);
String line;
int count = 0;
while ((line = in.readLine()) != null) {
// System.out.println(line);
Scanner scanner = new Scanner(line);
scanner.useDelimiter(" ");
String[] cols = new String[4];
int col = 0;
while (scanner.hasNext()) {
cols[col++] = scanner.next();
}
Y[count] = Double.valueOf(cols[1]);
X[count] [0] = Double.valueOf(cols[2]);
X[count] [1] = Double.valueOf(cols[3]);
count++;
}
in.close();
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
OLSMultipleLinearRegression regression = new
OLSMultipleLinearRegression();
regression.newSampleData(Y, X);
RealMatrix matrix = regression.calculateHat();
System.out.println("matrix:" + matrix.getColumnDimension());
}
On Fri, Feb 12, 2010 at 12:08 PM, Ted Dunning wrote:
> It is not a precision issue. R and commons-math use different algorithms
> with the same underlying numerical implementation.
>
> It is even an open question which result is better. R has lots of
> credibility, but I have found cases where it lacked precision (and I coded
> up a patch that was accepted).
>
> Unbounded precision integers and rationals are very useful, but not usually
> for large scale numerical programming. Except in a very few cases, if you
> need more than 17 digits of precision, you have other very serious problems
> that precision won't help.
>
> On Fri, Feb 12, 2010 at 1:40 AM, Andy Turner >wrote:
>
> > Interesting that this is a precision issue. I'm not surprised depending
> on
> > what you are doing, double precision may not be enough. It depends a lot
> on
> > how the calculations are broken into smaller parts. BigDecimal is
> > fantastically useful...
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>