mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-1000) Implementation of Single Sample T-Test using Map Reduce/Mahout
Date Fri, 20 Apr 2012 22:19:33 GMT


Ted Dunning commented on MAHOUT-1000:

I am not sure that I see the value here.  All you need for this calculation is the means,
the squared differences and the counts.

Do we really need this in Mahout when 3 lines of Pig suffice?
> Implementation of Single Sample T-Test using Map Reduce/Mahout
> --------------------------------------------------------------
>                 Key: MAHOUT-1000
>                 URL:
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: Backlog
>         Environment: Linux, Mac OS, Hadoop 0.20.2, Mahout 0.x
>            Reporter: Dev Lakhani
>              Labels: newbie
>             Fix For: Backlog
>   Original Estimate: 672h
>  Remaining Estimate: 672h
> Implement a map/reduce version of the single sample t test to test whether a sample of
n subjects comes from a population in which the mean equals a particular value.
> For a large dataset, say n millions of rows, one can test whether the sample (large as
it is) comes from the population mean.
> Input:
> 1) specified population mean to be tested against
> 2) hypothesis direction : i.e. "two.sided", "less", "greater".
> 3) confidence level or alpha
> 4) flag to indicate paired or not paired
> The procedure is as follows:
> 1. Use Map/Reduce to calculate the mean of the sample.
> 2. Use Map/Reduce to calculate standard error of the population mean.
> 3. Use Map/Reduce to calculate the t statistic
> 4. Estimate the degrees of freedom depending on equal sample variances 
> Output
> 1) The value of the t-statistic.
> 2) The p-value for the test.
> 3) Flag that is true if the null hypothesis can be rejected with confidence 1 - alpha;
false otherwise.
> References

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message