commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mauro Talevi <>
Subject Re: [math] Improving numerics in OLSMultipleLinearRegression
Date Sun, 15 Jun 2008 22:01:03 GMT
Phil Steitz wrote:

> No, just X.  see the references here:
> I think R uses QR as described above.  Comments or suggestions for other 
> default implementations are most welcome.  We should aim to provide a 
> default implementation that is reasonably fast and provides good 
> numerics across a broad range of design matrices.

Ok - noted.  I'll take a look at numerics issue during the week.

> We do need to decide what the API is, so even if it takes a while to 
> implement things, or the initial implementations are naive, we should 
> decide what statistics we are going to provide and how we are going to 
> provide them.  Same for the specification of models (i.e., "input data")

Yes - agreed, but meant to say that before we start adding these methods
to the interfaces, we should decide the whole list of statistics and 
input data - and that can be done on a wiki page, where people can 

>> Perhaps it would help if we had overloaded newData methods that accept 
>> different input strategies, but ultimately they will produce a n x m 
>> double array.  That way we can provide users with choice.

> I was thinking the same thing.  


> The bit that is troubling me is the 
> omega matrix required by GLS cluttering the OLS interface.  Other types 
> of models (e.g. weighted) will require other data.  Could be we need 
> separate interfaces for the different types of regression, but maybe it 
> is better to dispense with the abstract interface altogether.  The 
> reason we have interface / implementation separation is to allow 
> alternative implementations to be plugged in.  Given the 2.0 approach to 
> support IOC, what may make more sense is to just encapsulate the core 
> model estimators (things like R's lm, gls),  make them pluggable via 
> setters or constructors and get rid of the abstract interface.  Any 
> thoughts on this?

I see your point.  What made me fall on the side of a unified interface 
was that OLS could be seen as special case of GLS.  But yes the 
covariance muddles the OLS case.  I still think an interface defining 
the common statistics available from the different types of regression 
might be useful.  We would just not add the data input to the interface, 
which would instead be implementation specific.

I'm all for pluggable/IOC approaches, but I fail to see how this would 
get rid of the interface.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message