commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <p...@steitz.com>
Subject Re: [math] Improving numerics in OLSMultipleLinearRegression
Date Sat, 12 Jul 2008 22:02:19 GMT
Mauro Talevi wrote:
> Phil,
>
> Phil Steitz wrote:
>> I think R uses QR as described above.  Comments or suggestions for 
>> other default implementations are most welcome.  We should aim to 
>> provide a default implementation that is reasonably fast and provides 
>> good numerics across a broad range of design matrices.
>
> Got around to testing QR decomposition on OLS.
>
> The short answer is that is does not seems to make much difference. 
> Rather it looks like the dataset and the number of observations that 
> are far more significant for good numerics, as is also shown by your 
> recent addition of Swiss Fertility dataset (with nobs 3 times as 
> large), for which results match (either with or without QR) up to 
> 10^-12 tolerance.
See comments below on implementation.  Good numerics means good accuracy 
over a broad range of input data.  The Longley dataset is "hard" 
numerically and the Swiss fertility dataset is an easy one. 
>
> Here's the resulting numerics for comparison:
>
> Longley dataset (nobs=16):
>
> PL=[-3482258.6569276676, 15.06187677821299, -0.03581918037047249, 
> -2.0202298136474104, -1.0332268695603801, -0.051104103746114404, 
> 1829.1514737363977]
> QR=[-3482258.7119702557, 15.061873615257795, -0.03581918168712586, 
> -2.020229840231328, -1.0332268778552742, -0.05110409751647271, 
> 1829.1515061042903]
> LG=[-3482258.63459582, 15.0618722713733, -0.035819179292591, 
> -2.02022980381683, -1.03322686717359, -0.0511041056535807, 
> 1829.15146461355]
>
> Swiss Fertility dataset (nobs=47):
>
> PL=[91.05542390271336, -0.22064551045713723, -0.26058239824327045, 
> -0.9616123845602972, 0.12441843147162471]
> QR=[91.05542390271366, -0.22064551045714642, -0.26058239824326457, 
> -0.9616123845602974, 0.12441843147162669]
> SF=[91.05542390271397, -0.22064551045715, -0.26058239824328, 
> -0.9616123845603, 0.12441843147162]
>
> (Legend: PL = plain OLS, QR = QR-decomposed OLS, LG = Longley R 
> results, SF = Swiss Fertility R results).
>
> Interestingly, it's only on the intercepts (ie the first regression 
> parameter) that we get the very poor numerics.  While not a numerical 
> argument, one could say that the statistically more significant 
> parameter is the slope.
>
> Anyway, attached is patch with QR-based implementation and modified 
> test to print out comparison results.
Sorry it took so long for me to review this.  To really take advantage 
of the QR decomposition, the upper-trinagular system R b = Q' y (using 
your notation from javadoc) should be solved by back-substitution, 
rather than by inverting RTR.  That will require a little more work to 
implement, but should improve accuracy.  I just opened MATH-217 to track 
this.

Phil
 
 
> ------------------------------------------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message