commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Sterijevski <gsterijev...@gmail.com>
Subject Re: [math] SimpleRegression
Date Sat, 13 Aug 2011 06:19:56 GMT
As soon as you push the other changes, I will update my source and get to
work on this...
I thought SimpleRegression was a toy example, but after thinking about it,
there is much to like about the class.

-Greg

On Sat, Aug 13, 2011 at 1:16 AM, Phil Steitz <phil.steitz@gmail.com> wrote:

> On 8/12/11 10:59 PM, Greg Sterijevski wrote:
> > Also, I was thinking that maybe it might be useful to bring
> SimpleRegression
> > into line with the other regression techniques and give the user the
> ability
> > to constrain the constant to zero?
>
> +1 - just make sure to include clear javadoc and tests.
>
> Phil
> >
> > -Greg
> >
> > On Sat, Aug 13, 2011 at 12:13 AM, Greg Sterijevski
> > <gsterijevski@gmail.com>wrote:
> >
> >> One more thing... (ala Detective Colombo).
> >>
> >> In add and remove observation there is a snippet which looks like:
> >>
> >>             sumXX += dx * dx * (double) n / (n + 1d);
> >>             sumYY += dy * dy * (double) n / (n + 1d);
> >>             sumXY += dx * dy * (double) n / (n + 1d);
> >>             xbar += dx / (n + 1.0);
> >>             ybar += dy / (n + 1.0);
> >>
> >> Would you be against pulling out the divisions by ( n+1.0)?  Maybe
> >> something like:
> >>            final double fact1 = 1.0/((double) n + 1.0);
> >>            final double fact2 = ((double) n) * fact1;
> >>             sumXX += dx * dx * fact2;
> >>             sumYY += dy * dy * fact2;
> >>             sumXY += dx * dy * fact2;
> >>             xbar += dx * fact1;
> >>             ybar += dy * fact1;
> >>
> >>
> >> I realize that most likely the compiler or runtime does this
> optimization,
> >> but just in case...
> >>
> >>
> >> -Greg
> >>
> >>
> >> On Fri, Aug 12, 2011 at 11:40 PM, Greg Sterijevski <
> gsterijevski@gmail.com
> >>> wrote:
> >>> +1 from me, on all counts! -Greg
> >>>
> >>>
> >>> On Fri, Aug 12, 2011 at 11:30 PM, Phil Steitz <phil.steitz@gmail.com
> >wrote:
> >>>
> >>>> On 8/12/11 7:16 PM, Greg Sterijevski wrote:
> >>>>> Hello All,
> >>>>>
> >>>>> Before I chum the water with more JIRA tickets, I thought I would
see
> >>>>> whether people thought this was important.
> >>>>>
> >>>>> In the SimpleRegression you have two methods:
> >>>>>
> >>>>> public void addData(double x, double y) {
> >>>>>   ...some code that is not germane to discussion......
> >>>>>
> >>>>>         if (n > 2) {
> >>>>>             distribution = new TDistributionImpl(n - 2);
> >>>>>         }
> >>>>>     }
> >>>>>
> >>>>>     public void removeData(double x, double y) {
> >>>>>   ...some code that is not germane......
> >>>>>
> >>>>>             if (n > 2) {
> >>>>>                 distribution = new TDistributionImpl(n - 2);
> >>>>>             }
> >>>>>         }
> >>>>>     }
> >>>>>
> >>>>> >From the perspective of a user, you are likely to call add/remove
> >>>> repeatedly
> >>>>> before you ever need to check for statistical significance. Wouldn't
> it
> >>>> be
> >>>>> better to instantiate the TDistribution when it is needed?
> >>>>>
> >>>>> So you would have to make the following two methods a bit more
> >>>> complicated:
> >>>>>  public double getSlopeConfidenceInterval(double alpha)
> >>>>>         throws MathException {
> >>>>>         if (alpha >= 1 || alpha <= 0) {
> >>>>>             throw new
> >>>>> OutOfRangeException(LocalizedFormats.SIGNIFICANCE_LEVEL,
> >>>>>                                           alpha, 0, 1);
> >>>>>         }
> >>>>>         if( distribution == null ||
> distribution.getDegreesOfFreedom()
> >>>> !=
> >>>>> n-2){
> >>>>>           distribution = new TDistributionImpl(n - 2);
> >>>>> }
> >>>>>         return getSlopeStdErr() *
> >>>>>             distribution.inverseCumulativeProbability(1d - alpha
/
> 2d);
> >>>>>     }
> >>>>>
> >>>>> Similarly getSignificance() would have to be modified with the check
> of
> >>>> the
> >>>>> degrees of freedom of the distribution.
> >>>>>
> >>>>> There is nothing wrong with the current code, but making the change
> >>>> means
> >>>>> faster updates.
> >>>> Slightly, yes.  There is not much code at all in the distribution
> >>>> constructor, but you are correct.  Moreover, I can see now that the
> >>>> "immutability-everywhere" changes in trunk have made the code in the
> >>>> class a little funny.  In versions <=2.2, the TDistribution used
by
> >>>> the class was pluggable - i.e., there was a constructor that took at
> >>>> TDistribution as an argument, so if for some reason you wanted to
> >>>> use an impl different from TDistributionImpl, you could do that.
> >>>> There was also a setter for the distribution. In addition,
> >>>> TDistributionImpl itself was mutable, exposing a setDegreesOfFreedom
> >>>> method.  So the distribution member was set at construction time and
> >>>> the data update methods called the setter for DF on the
> >>>> distribution.  We decided to make the distributions immutable in 3.0
> >>>> (search the archives for discussion), so the current mods were done
> >>>> to basically accomplish that.  But the code should be cleaned up.
> >>>> The constructor taking an int is meaningless and should be
> >>>> deprecated or removed (unfortunately, we added that in 2.2 and
> >>>> advertised it as a deprecation replacement for the version that took
> >>>> a distribution as parameter.  We should have realized then that it
> >>>> was meaningless.  My bad for missing that. I would favor dropping it
> >>>> in 3.0, since even if anyone is using it, it isn't doing anything
> >>>> meaningful for them.)  Given that constructing a TDistributionImpl
> >>>> is trivial, we might as well eliminate the member field altogether
> >>>> and just create one when needed.  If you agree and no one else
> >>>> objects, I will make these changes.  Thanks for reviewing the code.
> >>>>
> >>>> Phil
> >>>>> Thoughts?
> >>>>>
> >>>>> -Greg
> >>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>
> >>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message