Robust locally weighted regression (Loess / Lowess)

Key: MATH278
URL: https://issues.apache.org/jira/browse/MATH278
Project: Commons Math
Issue Type: New Feature
Reporter: Eugene Kirpichov
Attached is a patch that implements the robust Loess procedure for smoothing univariate scatterplots
with local linear regression ( http://en.wikipedia.org/wiki/Local_regression) described by
William Cleveland in http://www.math.tau.ac.il/~yekutiel/MA%20seminar/Cleveland%201979.pdf
, with tests.
(Also, the patch fixes one missingjavadoc checkstyle warning in the AbstractIntegrator class:
I wanted to make it so that the code with my patch does not generate any checkstyle warnings
at all)
I propose to include the procedure into commonsmath because commonsmath, as of now, does
not possess a method for robust smoothing of noisy data: there is interpolation (which virtually
can't be used for noisy data at all) and there's regression, which has quite different goals.
Loess allows one to build a smooth curve with a controllable degree of smoothness that approximates
the overall shape of the data.
I tried to follow the code requirements as strictly as possible: the tests cover the code
completely, there are no checkstyle warnings, etc. The code is completely written by myself
from scratch, with no borrowings of thirdparty licensed code.
The method is pretty computationally intensive (10000 points with a bandwidth of 0.3 and 4
robustness iterations take about 3.7sec on my machine; generally the complexity is O(robustnessIters
* n^2 * bandwidth)), but I don't know how to optimize it further; all implementations that
I have found use exactly the same algorithm as mine for the unidimensional case.
Some TODOs, in vastly increasing order of complexity:
 Make the weight function customizable: according to Cleveland, this is needed in some exotic
cases only, like, where the desired approximation is noncontinuous, for example.
 Make the degree of the locally fitted polynomial customizable: currently the algorithm
does only a linear local regression; it might be useful to make it also use quadratic regression.
Higher degrees are not worth it, according to Cleveland.
 Generalize the algorithm to the multidimensional case: this will require A LOT of hard
work.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.
