commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: [Math] Exponential curve fitting problem
Date Thu, 14 Oct 2010 17:07:49 GMT
Using a single error measure for a wide range of functions doesn't work well
(as you have noted).

The fundamental problem is that squared error is a *really* bad metric on
curves like this.  If you plot your original function and the fitted
version, you will see what the issue is.  I took, for instance, data with x
= 1:20.  Your fitted function fits the first 17 data points very well and
fits the last 3 data points progressively more poorly.

To fix this, the fitting routine has to change the exponent by 0.3 and the
intercept by 10^13.  This means that you are trying to search *really*
narrow valley in your parameter space and it isn't surprising that the
optimizer is having some problems.

It is probably much more fruitful to recast your entire framework into
something like generalized linear modeling.  This will give you
much better conditioned problems that avoid your current issue.

In R, for instance, the following sequence fits your curve very nicely.  The
same algorithm applied in commons math would work just as well.

> data = data.frame(x=1:20, y= ( 20 + 10*exp(2*1:20)))
> plot(y~x, data)
> glm(log(y)~x, data, family=gaussian)

Call:  glm(formula = log(y) ~ x, family = gaussian, data = data, start =
c(1,      1))

Coefficients:
 (Intercept)             x
2.3580706246  1.9960549114

Degrees of Freedom: 19 Total (i.e. Null);  18 Residual
Null Deviance:    2649.5608104
Residual Deviance: 0.044396172588 AIC: -59.449144468
> m=glm(log(y)~x, data, family=gaussian, start=c(1,1))
> summary(m)

Call:
glm(formula = log(y) ~ x, family = gaussian, data = data, start = c(1,
    1))

Deviance Residuals:
             Min                1Q            Median                3Q
        Max
-0.0390344769970  -0.0249105105363  -0.0098817616272   0.0086221578383
0.1880043231948

Coefficients:
                   Estimate      Std. Error    t value   Pr(>|t|)
(Intercept) 2.3580706246025 0.0230702149442  102.21277 < 2.22e-16 ***
x           1.9960549114187 0.0019258643339 1036.44627 < 2.22e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.0024664540326436)

    Null deviance: 2.64956081042e+03  on 19  degrees of freedom
Residual deviance: 4.43961725876e-02  on 18  degrees of freedom
AIC: -59.4491444677

Number of Fisher Scoring iterations: 2

> lines(data$x, exp(predict(m)))



On Thu, Oct 14, 2010 at 9:16 AM, Christiaan <christiaan_db@hotmail.com>wrote:

> Hi,
> I am reposting this issue since the original one through nabble forums
> somehow didn't get accepted, so I am hoping this one is;-) I am currently
> evaluating whether commons Math can be used in our project for regression
> analysis. Based on an earlier thread I've tried to apply natural
> exponential
> curve fitting:
>
> http://apache-commons.680414.n4.nabble.com/MATH-Need-help-on-math-libraries-for-curve-generation-td1050024.html#a1050024
>
> The results are not really promising.  If I generate y values for this
> function:
> y = 20 + 10*e^(2*x)
>
> the result is:
> y = 1.040425114042751E13 + 21.13 * e^(1.7x)
> RMS:2.895606396069334E25
>
> which isn't a good fit. Any ideas if and how this can be improved?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message