commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Warren Tang <warren.c.t...@gmail.com>
Subject Re: [math] SummaryStatistics.setVarianceImpl Usage
Date Fri, 21 Oct 2011 09:24:10 GMT
Let's see your workaround:
    double populationStandardDeviation = 
FastMath.sqrt(stats.getSecondMoment() / stats.getN());
Another way to compute the same standard deviation is:
   double samePopulationStandardDeviation = FastMath.sqrt(new 
Variance(false).evaluate(scores));
We can infer from previous two statements that in [math]:
  stats.getSecondMoment() / stats.getN() = new 
Variance(false).evaluate(scores)  = sum((x_i - mean)^2) / n ;

But this is not the convention 
<http://en.wikipedia.org/wiki/Second_moment#Variance>that the second 
central moment is equal to variance, i.e. "sum((x_i - mean)^2) / n".

In a word, the return value of "SummaryStatistics.getSecondMoment" is 
only the "sum((x_i - mean)^2)" without being divided by N; So it is not 
the *expected value*,  as the document of " states:
"Returns a statistic related to the Second Central Moment. Specifically, 
what is returned is the sum of squared deviations from the sample mean 
among the values that have been added."

Regards,
Warren Tang <http://blog.tangcs.com>

On 10/21/2011 4:51 PM, Mikkel Meyer Andersen wrote:
> Dear Warren,
>
> As far as I know, in [math] we have adopted the standard naming
> convention (as you seem to use yourself), which is:
> E[X^2]: second moment
> E[(X - E[X])^2]: central second moment
>
> And similar for higher orders moments.
>
> Cheers, Mikkel.
>
> 2011/10/21 Warren Tang<warren.c.tang@gmail.com>:
>> The getSecondMoment does not return the real second central moment which
>> should be equal to variance. I think it is confusing and should be
>> stressed in the document.
>>
>>
>> On 10/17/2011 1:23 AM, Warren Tang wrote:
>>> Thanks for the workaround. I've reported the bug here:
>>> https://issues.apache.org/jira/browse/MATH-691
>>>
>>> Regards,
>>> Warren Tang<http://blog.tangcs.com>
>>>
>>>
>>> On Sunday, October 16, 2011 11:24:26 PM, Mikkel Meyer Andersen wrote:
>>>> Dear Warren,
>>>>
>>>> This is probably a bug. Sorry for this. Would you be so kind to report
>>>> it as described on http://commons.apache.org/math/issue-tracking.html
>>>> .
>>>>
>>>> What you can do instead is this:
>>>> int[] scores = {1, 2, 3, 4};
>>>>
>>>> SummaryStatistics stats = new SummaryStatistics();
>>>> for(int i : scores) {
>>>> stats.addValue(i);
>>>> }
>>>> double sd = FastMath.sqrt(stats.getSecondMoment() / stats.getN());
>>>>
>>>> System.out.println(sd);
>>>>
>>>> So, calculating sd as:
>>>> double sd = FastMath.sqrt(stats.getSecondMoment() / stats.getN());
>>>>
>>>> And then there is no need to stats.setVarianceImpl(new Variance(false)).
>>>>
>>>> Cheers, Mikkel.
>>>>
>>>> 2011/10/16 Warren Tang<warren.c.tang@gmail.com>:
>>>>> Hi, Mikkel
>>>>>
>>>>> I'm using commons-math 2.2. The code to reproduce the issue.
>>>>>
>>>>> import org.apache.commons.math.stat.descriptive.SummaryStatistics;
>>>>> import org.apache.commons.math.stat.descriptive.moment.Variance;
>>>>>
>>>>> @Test public void testStandardDeviation() {
>>>>> int[] scores = {1, 2, 3, 4};
>>>>> SummaryStatistics stats = new SummaryStatistics();
>>>>> stats.setVarianceImpl(new Variance(false)); //use "population variance"
>>>>> for(int i : scores) {
>>>>> stats.addValue(i);
>>>>> }
>>>>> double sd = stats.getStandardDeviation();
>>>>> System.out.println(sd);
>>>>> }
>>>>>
>>>>> Regards,
>>>>> Warren Tang<http://blog.tangcs.com>
>>>>>
>>>>> On 10/16/2011 10:43 PM, Mikkel Meyer Andersen wrote:
>>>>>> Dear Warren,
>>>>>>
>>>>>> Could you provide values for the scores-variable in the current
>>>>>> example making it possible to reproduce?
>>>>>>
>>>>>> Are you in fact using version 1.2 as reflected by the link you gave?
>>>>>> Or which version are you using?
>>>>>>
>>>>>> Cheers, Mikkel.
>>>>>>
>>>>>> 2011/10/16 Warren Tang<warren.c.tang@gmail.com>:
>>>>>>> Hello, everyone
>>>>>>>
>>>>>>> I'm trying to get a "population standard deviation
>>>>>>>
>>>>>>> <http://commons.apache.org/math/api-1.2/org/apache/commons/math/stat/descriptive/moment/StandardDeviation.html>"
>>>>>>>
>>>>>>> (non-bias-corrected) from SummaryStatistics.
>>>>>>>
>>>>>>> This is what I did:
>>>>>>>
>>>>>>> SummaryStatistics stats = new SummaryStatistics();
>>>>>>> stats.setVarianceImpl(new Variance(false)); //use "population
>>>>>>> variance"
>>>>>>> ( sum((x_i - mean)^2) / n )
>>>>>>> for(int i : scores) {
>>>>>>> stats.addValue(i);
>>>>>>> }
>>>>>>> double sd = stats.getStandardDeviation();
>>>>>>>
>>>>>>> However, the value of "sd" is "NaN". How can I do it correctly?
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Warren Tang<http://blog.tangcs.com>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: user-help@commons.apache.org
>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: user-help@commons.apache.org
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> For additional commands, e-mail: user-help@commons.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message