commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [math] SimpleRegression
Date Sat, 13 Aug 2011 06:16:04 GMT
On 8/12/11 10:59 PM, Greg Sterijevski wrote:
> Also, I was thinking that maybe it might be useful to bring SimpleRegression
> into line with the other regression techniques and give the user the ability
> to constrain the constant to zero?

+1 - just make sure to include clear javadoc and tests.

Phil
>
> -Greg
>
> On Sat, Aug 13, 2011 at 12:13 AM, Greg Sterijevski
> <gsterijevski@gmail.com>wrote:
>
>> One more thing... (ala Detective Colombo).
>>
>> In add and remove observation there is a snippet which looks like:
>>
>>             sumXX += dx * dx * (double) n / (n + 1d);
>>             sumYY += dy * dy * (double) n / (n + 1d);
>>             sumXY += dx * dy * (double) n / (n + 1d);
>>             xbar += dx / (n + 1.0);
>>             ybar += dy / (n + 1.0);
>>
>> Would you be against pulling out the divisions by ( n+1.0)?  Maybe
>> something like:
>>            final double fact1 = 1.0/((double) n + 1.0);
>>            final double fact2 = ((double) n) * fact1;
>>             sumXX += dx * dx * fact2;
>>             sumYY += dy * dy * fact2;
>>             sumXY += dx * dy * fact2;
>>             xbar += dx * fact1;
>>             ybar += dy * fact1;
>>
>>
>> I realize that most likely the compiler or runtime does this optimization,
>> but just in case...
>>
>>
>> -Greg
>>
>>
>> On Fri, Aug 12, 2011 at 11:40 PM, Greg Sterijevski <gsterijevski@gmail.com
>>> wrote:
>>> +1 from me, on all counts! -Greg
>>>
>>>
>>> On Fri, Aug 12, 2011 at 11:30 PM, Phil Steitz <phil.steitz@gmail.com>wrote:
>>>
>>>> On 8/12/11 7:16 PM, Greg Sterijevski wrote:
>>>>> Hello All,
>>>>>
>>>>> Before I chum the water with more JIRA tickets, I thought I would see
>>>>> whether people thought this was important.
>>>>>
>>>>> In the SimpleRegression you have two methods:
>>>>>
>>>>> public void addData(double x, double y) {
>>>>>   ...some code that is not germane to discussion......
>>>>>
>>>>>         if (n > 2) {
>>>>>             distribution = new TDistributionImpl(n - 2);
>>>>>         }
>>>>>     }
>>>>>
>>>>>     public void removeData(double x, double y) {
>>>>>   ...some code that is not germane......
>>>>>
>>>>>             if (n > 2) {
>>>>>                 distribution = new TDistributionImpl(n - 2);
>>>>>             }
>>>>>         }
>>>>>     }
>>>>>
>>>>> >From the perspective of a user, you are likely to call add/remove
>>>> repeatedly
>>>>> before you ever need to check for statistical significance. Wouldn't
it
>>>> be
>>>>> better to instantiate the TDistribution when it is needed?
>>>>>
>>>>> So you would have to make the following two methods a bit more
>>>> complicated:
>>>>>  public double getSlopeConfidenceInterval(double alpha)
>>>>>         throws MathException {
>>>>>         if (alpha >= 1 || alpha <= 0) {
>>>>>             throw new
>>>>> OutOfRangeException(LocalizedFormats.SIGNIFICANCE_LEVEL,
>>>>>                                           alpha, 0, 1);
>>>>>         }
>>>>>         if( distribution == null || distribution.getDegreesOfFreedom()
>>>> !=
>>>>> n-2){
>>>>>           distribution = new TDistributionImpl(n - 2);
>>>>> }
>>>>>         return getSlopeStdErr() *
>>>>>             distribution.inverseCumulativeProbability(1d - alpha / 2d);
>>>>>     }
>>>>>
>>>>> Similarly getSignificance() would have to be modified with the check
of
>>>> the
>>>>> degrees of freedom of the distribution.
>>>>>
>>>>> There is nothing wrong with the current code, but making the change
>>>> means
>>>>> faster updates.
>>>> Slightly, yes.  There is not much code at all in the distribution
>>>> constructor, but you are correct.  Moreover, I can see now that the
>>>> "immutability-everywhere" changes in trunk have made the code in the
>>>> class a little funny.  In versions <=2.2, the TDistribution used by
>>>> the class was pluggable - i.e., there was a constructor that took at
>>>> TDistribution as an argument, so if for some reason you wanted to
>>>> use an impl different from TDistributionImpl, you could do that.
>>>> There was also a setter for the distribution. In addition,
>>>> TDistributionImpl itself was mutable, exposing a setDegreesOfFreedom
>>>> method.  So the distribution member was set at construction time and
>>>> the data update methods called the setter for DF on the
>>>> distribution.  We decided to make the distributions immutable in 3.0
>>>> (search the archives for discussion), so the current mods were done
>>>> to basically accomplish that.  But the code should be cleaned up.
>>>> The constructor taking an int is meaningless and should be
>>>> deprecated or removed (unfortunately, we added that in 2.2 and
>>>> advertised it as a deprecation replacement for the version that took
>>>> a distribution as parameter.  We should have realized then that it
>>>> was meaningless.  My bad for missing that. I would favor dropping it
>>>> in 3.0, since even if anyone is using it, it isn't doing anything
>>>> meaningful for them.)  Given that constructing a TDistributionImpl
>>>> is trivial, we might as well eliminate the member field altogether
>>>> and just create one when needed.  If you agree and no one else
>>>> objects, I will make these changes.  Thanks for reviewing the code.
>>>>
>>>> Phil
>>>>> Thoughts?
>>>>>
>>>>> -Greg
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message