commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <>
Subject Re: [math] Least Squares Outlier Rejection
Date Fri, 12 Sep 2014 12:55:50 GMT

On Fri, 12 Sep 2014 09:16:07 +0200, Luc Maisonobe wrote:
> Le 12/09/2014 01:35, Gilles a écrit :
>> Hello.
>> On Thu, 11 Sep 2014 14:29:49 -0400, Evan Ward wrote:
>>> Hi,
>>> A while ago I had bought up the idea of adding residual editing 
>>> (aka data
>>> editing, outlier rejection, robust regression) to our non-linear 
>>> least
>>> squares implementations.[1] As the name suggests, the idea is to
>>> de-weight
>>> observations that don't match the user's model. There are several 
>>> ways to
>>> this including choosing a fixed cutoff, a fixed standard deviation
>>> cutoff,
>>> or reducing a residual's weight based on its magnitude.[2]
>>> However we add the data editing feature I think it will cause 
>>> backward
>>> incompatibilities with the released API. I've outlined below the 
>>> two
>>> options I see. I'm open to other ideas as well.
>>> 1. Replace edited residuals with 0's in the residual vector and 
>>> Jacobian
>>> (i.e. apply a 0 weight). This has the advantage of being simple to
>>> implement and that our existing optimizers are already able to 
>>> handle it.
>>> The downside is evident when the user tries to obtain the number of
>>> residuals that were edited. It is hard to tell the difference 
>>> between an
>>> edited residual, an apriori zero weight, or a model evaluation 
>>> where the
>>> residual and gradient is, in fact, zero. We can provide easy access 
>>> to
>>> the
>>> number of edited residuals by adding a method to the Evaluation
>>> interface.
>>> (This is what I implemented in the patch in the original thread.) 
>>> Now
>>> that
>>> the code has been released though, this would cause a backward
>>> incompatibility for some advanced users. Most users will likely use 
>>> the
>>> included factory and builder methods to define their
>>> LeastSquaresProblem(LSP) and these users would not be affected by 
>>> the
>>> change. Only the users that provide a custom implementation of
>>> LSP.Evauation would be affected.
>>> 2. Remove edited residuals from the gradient and Jacobian, so that 
>>> the
>>> resulting vector and matrix have fewer rows. The advantage here is
>>> that the
>>> user can compare the length of the residual vector in the Optimum 
>>> to the
>>> number of observations in the LSP to determine the number of edited
>>> residuals. The problem is that returning vectors/matrices with 
>>> different
>>> sizes from LSP.evaluate() would violate the contract. Additionally 
>>> we
>>> would
>>> have to modify our existing optimizers to deal with the variable 
>>> lengths.
>>> For GaussNewton the modification would be small, but for
>>> LevenburgMarquardt
>>> I would likely have to re-write it since I don't understand the 
>>> code (not
>>> for lack of trying :P ). Users that implement 
>>> LeastSquaresOptimizers
>>> would
>>> likely have to modify their code as well.
>>> To summarize, in both cases users that only use the provided [math]
>>> classes
>>> would not have to modify their code, while users that provide 
>>> custom
>>> implementations of [math] interfaces would have to modify their 
>>> code.
>>> I would like to get this feature wrapped in for the next release. 
>>> Please
>>> let me know if you have a preference for either implementation and 
>>> if
>>> there
>>> are any other issues I should consider.
>> Compatibility breaks cannot occur in minor releases.
>> The next major release should not occur before deprecated classes 
>> are all
>> replaced. [I'm thinking about the optimizers, for which the fluent 
>> API
>> should
>> be implemented based on your design of NLLS.]
> I theoretically agree here, but we are in a sort of chicken and egg
> problem. Typically, when I proposed to switch the ODE to the new 
> fluent
> API, we decided we should do it at a major release only. We also said 
> we
> cannot remove deprecation at minor releases. Now we say major release
> should wait for deprecation to be removed.
> So As far as I understand, everything should occur at once for major
> releases : remove deprecated classes, introduce new classes and
> interfaces, but see next remark.

No, there is actually no chicken and egg problem in this case.
The issue is only that we should not delete an API before a replacement
is provided.
1. We want to remove the deprecated optimizers API
    -> must be done in major release (e.g. 4.0)
2. We want to let users upgrade their code
    -> the new API must exist before 4.0

1 and 2 are not contradictory as the new API will be written from
scratch (it is not a modification, say, of existing interfaces).

>> It would be nice to recode the whole "LevenbergMarquardtOptimizer" 
>> in
>> full OO
>> Java. But it should be implemented and tested before any new feature 
>> is
>> added
>> to the mix.
> Here comes another problem: we want to have time to stabilize new 
> API,
> which mean we should have a few minor releases with trial API.
> Unfortunately, once these trial API have been introduced, we are 
> stuck
> with them and we end up piling several API togethers (the optimizers 
> are
> the perfect example).

Piling up, yes, but it is not a problem: all the old APIs are already
deprecated; "only" the replacements are missing. We must provide them
in a minor release before 4.0. [This should not need to delay the next
release (3.4), as we can have as many 3.x as we want...]

> So when do we test new API : at minor releases just before a major
> release, but then we keep all the trials until a big clean up at 
> major
> release ?


>> Do I understand correctly that in the "robust" fit, the weights are
>> modified
>> during the optimization?
>> If so, would the algorithms still be "standard"?
> I think performing some modifications between iterations is still a
> standard way to perform some optimizations: we can detect the 
> outliers
> only once we have computed residuals and some statistics like the
> standard deviation, so this editing operation should occur somewhere
> during the optimization process.

I don't see the that the editing has to occur during the optimization.
It could be independent:
  1. Let the optimization run to completion with all the data
  2. Compute the residuals
  3a. If there are no outliers, stop
  3b. If there are outliers, remove them (or modify their weight)
  4. Run the optimization with the remaining data
  5. Goto 2

The advantage I see here is that the weight modification is a user
decision (and implementation).

However, IIUC the examples from the link provided by Evan, "robust"
optimization (i.e. handling outliers during optimization) could lead
to a "better" solution.

Would it always be "better"? Not sure: significant points could be
mistaken as outliers and be discarded before the optimizer could
figure out the correct solution...
To avoid a "good but wrong" fit, I'm afraid that we'd have to
introduce several parameters (like "do not discard outliers before
iteration n", "do not discard more than m points", etc.)
The tuning of those parameters will probably not be obvious, and
they will surely complicate the code (e.g. input data like the
"target" and "weight" array won't be "final").

> I have already seen this typically in
> orbit determination so I would consider it a classical optional 
> feature.

What works in one domain might not in another.
Thus, the feature should not alter the "standardness", nor decrease
the robustness of the implementation. Caring for special cases
(for which the feature is useful) may be obtained by e.g. use the
standard algorithm as a basic block that is called repeatedly, as
hinted above (and tuning the standard parameters and input data
appropriately for each call).

>> At first sight, I'd avoid modification of the sizes of input data
>> (option 2);
>> from an API usage viewpoint, I imagine that user code will require
>> additional
>> "length" tests.
>> Couldn't the problem you mention in option 1 disappear by having 
>> different
>> methods that return the a-priori weights and the modified weights?
> As we are already much too stringent with our compatibility policy, I
> would allow the case were only *very* advanced users would have
> problems. So it would seem fair to me if we can do some changes were 
> the
> users of the factory are unaffected and expert users who decided the
> factory was not good for them have to put new efforts.

Before embarking on this, I would like to see examples where the 
outlier rejection is leads to a (correct) solution impossible to 
with the approach I've suggested.

> Anyway, I would be OK with both solutions, and also think rewriting
> Levenberg-Marquardt would be a good thing. I was translated from 
> Minpack
> with only a change in the QR decomposition and a change in the
> convergence checker, so basically it is Fortran written with Java 
> syntax.

I just want to remind that quite some effort has been put already in 
modification of the original port (removal of many "work" arrays, 
data). It's not anymore just Fortran in Java. [But the core algorithm 
certainly not obvious, and I'm all for a clean reimplementation (to 
in parallel with the current one, until everybody agrees that nothing 


>>> [1]
>>> [2]

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message