Mauro Talevi wrote:
> Phil,
>
> Phil Steitz wrote:
>> I think R uses QR as described above. Comments or suggestions for
>> other default implementations are most welcome. We should aim to
>> provide a default implementation that is reasonably fast and provides
>> good numerics across a broad range of design matrices.
>
> Got around to testing QR decomposition on OLS.
>
> The short answer is that is does not seems to make much difference.
> Rather it looks like the dataset and the number of observations that
> are far more significant for good numerics, as is also shown by your
> recent addition of Swiss Fertility dataset (with nobs 3 times as
> large), for which results match (either with or without QR) up to
> 10^12 tolerance.
See comments below on implementation. Good numerics means good accuracy
over a broad range of input data. The Longley dataset is "hard"
numerically and the Swiss fertility dataset is an easy one.
>
> Here's the resulting numerics for comparison:
>
> Longley dataset (nobs=16):
>
> PL=[3482258.6569276676, 15.06187677821299, 0.03581918037047249,
> 2.0202298136474104, 1.0332268695603801, 0.051104103746114404,
> 1829.1514737363977]
> QR=[3482258.7119702557, 15.061873615257795, 0.03581918168712586,
> 2.020229840231328, 1.0332268778552742, 0.05110409751647271,
> 1829.1515061042903]
> LG=[3482258.63459582, 15.0618722713733, 0.035819179292591,
> 2.02022980381683, 1.03322686717359, 0.0511041056535807,
> 1829.15146461355]
>
> Swiss Fertility dataset (nobs=47):
>
> PL=[91.05542390271336, 0.22064551045713723, 0.26058239824327045,
> 0.9616123845602972, 0.12441843147162471]
> QR=[91.05542390271366, 0.22064551045714642, 0.26058239824326457,
> 0.9616123845602974, 0.12441843147162669]
> SF=[91.05542390271397, 0.22064551045715, 0.26058239824328,
> 0.9616123845603, 0.12441843147162]
>
> (Legend: PL = plain OLS, QR = QRdecomposed OLS, LG = Longley R
> results, SF = Swiss Fertility R results).
>
> Interestingly, it's only on the intercepts (ie the first regression
> parameter) that we get the very poor numerics. While not a numerical
> argument, one could say that the statistically more significant
> parameter is the slope.
>
> Anyway, attached is patch with QRbased implementation and modified
> test to print out comparison results.
Sorry it took so long for me to review this. To really take advantage
of the QR decomposition, the uppertrinagular system R b = Q' y (using
your notation from javadoc) should be solved by backsubstitution,
rather than by inverting RTR. That will require a little more work to
implement, but should improve accuracy. I just opened MATH217 to track
this.
Phil
> 
>
> 
> To unsubscribe, email: devunsubscribe@commons.apache.org
> For additional commands, email: devhelp@commons.apache.org

To unsubscribe, email: devunsubscribe@commons.apache.org
For additional commands, email: devhelp@commons.apache.org
