Return-Path:
+ * Tests can be:
+ * Test statistics are available for all tests. Methods including "Test" in
+ * in their names perform tests, all other methods return t-statistics. Among
+ * the "Test" methods,
+ * Input to tests can be either
* The number returned is the smallest significance level
@@ -83,11 +99,10 @@
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- double pairedTTest(double[] sample1, double[] sample2)
- throws IllegalArgumentException, MathException;
-
+ public abstract double pairedTTest(double[] sample1, double[] sample2)
+ throws IllegalArgumentException, MathException;
/**
- * Performs a paired t-test evaluating the null hypothesis that the
+ * Performs a paired t-test evaluating the null hypothesis that the
* mean of the paired differences between
- * This statistic can be used to perform a two-sample t-test to compare
- * sample means.
+ * This statistic can be used to perform a (homoscedastic) two-sample
+ * t-test to compare sample means.
*
- * If
- * (1)
* where
- * If
- * (2)
+ * The t-statisitc is
+ *
+ *
+ * where
* Preconditions:
* This statistic can be used to perform a two-sample t-test to compare
* sample means.
*
- * If
+ *
- * (1)
+ * Preconditions:
+ * This statistic can be used to perform a (homoscedastic) two-sample
+ * t-test to compare sample means.
+ *
+ * The t-statisitc returned is
+ *
+ *
* where
*
- * If
- * (2)
* Preconditions:
* The number returned is the smallest significance level
@@ -270,13 +338,12 @@
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- double tTest(double mu, double[] sample)
- throws IllegalArgumentException, MathException;
-
+ public abstract double tTest(double mu, double[] sample)
+ throws IllegalArgumentException, MathException;
/**
* Performs a
* two-sided t-test evaluating the null hypothesis that the mean of the population from
- * which
* Returns
@@ -327,7 +392,8 @@
* Usage Note:
* Preconditions:
- * Returns
+ * Returns
* Examples:
* The number returned is the smallest significance level
@@ -391,19 +458,50 @@
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
*
- * If the
+ * Usage Note:
+ * Preconditions:
+ * The number returned is the smallest significance level
+ * at which one can reject the null hypothesis that the two means are
+ * equal in favor of the two-sided alternative that they are different.
+ * For a one-sided test, divide the returned value by 2.
*
- * If
* Usage Note:
* Returns
- * If the
+ * Examples:
- * If
+ * Preconditions:
+ * Returns
+ * A pooled variance estimate is used to compute the t-statistic. See
+ * {@link #t(double[], double[])} for the formula. The sum of the sample
+ * sizes minus 2 is used as the degrees of freedom.
*
* Examples:
* Usage Note:
* The number returned is the smallest significance level
* at which one can reject the null hypothesis that the two means are
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
*
- * If the
+ * Usage Note:
+ * Preconditions:
- * If
+ * See {@link #homoscedasticT(double[], double[])} for the formula used to
+ * compute the t-statistic. The sum of the sample sizes minus 2 is used as
+ * the degrees of freedom.
*
* Usage Note:
* Returns
- * If the
- * If
* Examples:
* Usage Note:
+ * Uses commons-math {@link org.apache.commons.math.distribution.TDistribution}
+ * implementation to estimate exact p-values.
*
* @version $Revision$ $Date$
*/
@@ -72,8 +75,7 @@
/**
* Returns the observed significance level, or
- *
- * p-value, associated with a paired, two-sample, two-tailed t-test
+ * p-value, associated with a paired, two-sample, two-tailed t-test
* based on the data in the input arrays.
*
* The number returned is the smallest significance level
@@ -113,7 +115,7 @@
}
/**
- * Performs a paired t-test evaluating the null hypothesis that the
+ * Performs a paired t-test evaluating the null hypothesis that the
* mean of the paired differences between
- * This statistic can be used to perform a two-sample t-test to compare
- * sample means.
+ * This statistic can be used to perform a (homoscedastic) two-sample
+ * t-test to compare sample means.
*
- * If
- * (1)
* where
- * If
+ * This statistic can be used to perform a two-sample t-test to compare
+ * sample means.
+ *
+ * The t-statisitc is
*
- * (2)
+ * where
* Preconditions:
* This statistic can be used to perform a two-sample t-test to compare
* sample means.
*
- * If
+ *
+ * where
- * (1)
+ * This statistic can be used to perform a (homoscedastic) two-sample
+ * t-test to compare sample means.
+ *
+ * The t-statisitc returned is
+ *
+ *
* where
*
- * If
- * (2)
* Preconditions:
* The number returned is the smallest significance level
@@ -331,13 +409,14 @@
if ((sample == null) || (sample.length < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
- return tTest( StatUtils.mean(sample), mu, StatUtils.variance(sample), sample.length);
+ return tTest( StatUtils.mean(sample), mu, StatUtils.variance(sample),
+ sample.length);
}
/**
* Performs a
* two-sided t-test evaluating the null hypothesis that the mean of the population from
- * which
* Returns
@@ -393,7 +471,8 @@
* Usage Note:
* Preconditions:
- * Returns
+ * Returns
* Examples:
* The number returned is the smallest significance level
@@ -467,19 +548,59 @@
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
*
- * If the
+ * Usage Note:
+ * Preconditions:
+ * The number returned is the smallest significance level
+ * at which one can reject the null hypothesis that the two means are
+ * equal in favor of the two-sided alternative that they are different.
+ * For a one-sided test, divide the returned value by 2.
*
- * If
* Usage Note:
* Returns
- * If the
+ * Examples:
+ * Usage Note:
+ * Preconditions:
- * If
+ * A pooled variance estimate is used to compute the t-statistic. See
+ * {@link #t(double[], double[])} for the formula. The sum of the sample
+ * sizes minus 2 is used as the degrees of freedom.
*
* Examples:
* Usage Note:
* The number returned is the smallest significance level
* at which one can reject the null hypothesis that the two means are
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
*
- * If the
+ * Usage Note:
+ * Preconditions:
+ * The number returned is the smallest significance level
+ * at which one can reject the null hypothesis that the two means are
+ * equal in favor of the two-sided alternative that they are different.
+ * For a one-sided test, divide the returned value by 2.
*
- * If
* Usage Note:
* Returns
- * If the
- * If
* Examples:
* Usage Note:
+ * Does not assume that subpopulation variances are equal.
*
* @param m1 first sample mean
* @param m2 second sample mean
@@ -747,17 +956,29 @@
* @param v2 second sample variance
* @param n1 first sample n
* @param n2 second sample n
- * @param equalVariances are variances assumed equal?
* @return t test statistic
*/
protected double t(double m1, double m2, double v1, double v2, double n1,
- double n2, boolean equalVariances) {
- if (equalVariances) {
- double pooledVariance = ((n1 - 1) * v1 + (n2 -1) * v2 ) / (n1 + n2 - 2);
- return (m1 - m2) / Math.sqrt(pooledVariance * (1d / n1 + 1d / n2));
- } else {
+ double n2) {
return (m1 - m2) / Math.sqrt((v1 / n1) + (v2 / n2));
- }
+ }
+
+ /**
+ * Computes t test statistic for 2-sample t-test under the hypothesis
+ * of equal subpopulation variances.
+ *
+ * @param m1 first sample mean
+ * @param m2 second sample mean
+ * @param v1 first sample variance
+ * @param v2 second sample variance
+ * @param n1 first sample n
+ * @param n2 second sample n
+ * @return t test statistic
+ */
+ protected double homoscedasticT(double m1, double m2, double v1,
+ double v2, double n1, double n2) {
+ double pooledVariance = ((n1 - 1) * v1 + (n2 -1) * v2 ) / (n1 + n2 - 2);
+ return (m1 - m2) / Math.sqrt(pooledVariance * (1d / n1 + 1d / n2));
}
/**
@@ -780,8 +1001,9 @@
/**
* Computes p-value for 2-sided, 2-sample t-test.
- * If equalVariances is true, the sum of the sample sizes minus 2
- * is used as df; otherwise df is approximated from the data.
+ *
+ * Does not assume subpopulation variances are equal. Degrees of freedom
+ * are estimated from the data.
*
* @param m1 first sample mean
* @param m2 second sample mean
@@ -789,20 +1011,41 @@
* @param v2 second sample variance
* @param n1 first sample n
* @param n2 second sample n
- * @param equalVariances are variances assumed equal?
* @return p-value
* @throws MathException if an error occurs computing the p-value
*/
protected double tTest(double m1, double m2, double v1, double v2,
- double n1, double n2, boolean equalVariances)
+ double n1, double n2)
+ throws MathException {
+ double t = Math.abs(t(m1, m2, v1, v2, n1, n2));
+ double degreesOfFreedom = 0;
+ degreesOfFreedom= df(v1, v2, n1, n2);
+ TDistribution tDistribution =
+ getDistributionFactory().createTDistribution(degreesOfFreedom);
+ return 1.0 - tDistribution.cumulativeProbability(-t, t);
+ }
+
+ /**
+ * Computes p-value for 2-sided, 2-sample t-test, under the assumption
+ * of equal subpopulation variances.
+ *
+ * The sum of the sample sizes minus 2 is used as degrees of freedom.
+ *
+ * @param m1 first sample mean
+ * @param m2 second sample mean
+ * @param v1 first sample variance
+ * @param v2 second sample variance
+ * @param n1 first sample n
+ * @param n2 second sample n
+ * @return p-value
+ * @throws MathException if an error occurs computing the p-value
+ */
+ protected double homoscedasticTTest(double m1, double m2, double v1,
+ double v2, double n1, double n2)
throws MathException {
- double t = Math.abs(t(m1, m2, v1, v2, n1, n2, equalVariances));
+ double t = Math.abs(t(m1, m2, v1, v2, n1, n2));
double degreesOfFreedom = 0;
- if (equalVariances) {
degreesOfFreedom = (double) (n1 + n2 - 2);
- } else {
- degreesOfFreedom= df(v1, v2, n1, n2);
- }
TDistribution tDistribution =
getDistributionFactory().createTDistribution(degreesOfFreedom);
return 1.0 - tDistribution.cumulativeProbability(-t, t);
1.6 +24 -24 jakarta-commons/math/src/test/org/apache/commons/math/stat/inference/TTestTest.java
Index: TTestTest.java
===================================================================
RCS file: /home/cvs/jakarta-commons/math/src/test/org/apache/commons/math/stat/inference/TTestTest.java,v
retrieving revision 1.5
retrieving revision 1.6
diff -u -r1.5 -r1.6
--- TTestTest.java 2 Jun 2004 13:08:55 -0000 1.5
+++ TTestTest.java 2 Aug 2004 04:20:09 -0000 1.6
@@ -166,73 +166,73 @@
// Target comparison values computed using R version 1.8.1 (Linux version)
assertEquals("two sample heteroscedastic t stat", 1.603717,
- testStatistic.t(sample1, sample2, false), 1E-6);
+ testStatistic.t(sample1, sample2), 1E-6);
assertEquals("two sample heteroscedastic t stat", 1.603717,
- testStatistic.t(sampleStats1, sampleStats2, false), 1E-6);
+ testStatistic.t(sampleStats1, sampleStats2), 1E-6);
assertEquals("two sample heteroscedastic p value", 0.1288394,
- testStatistic.tTest(sample1, sample2, false), 1E-7);
+ testStatistic.tTest(sample1, sample2), 1E-7);
assertEquals("two sample heteroscedastic p value", 0.1288394,
- testStatistic.tTest(sampleStats1, sampleStats2, false), 1E-7);
+ testStatistic.tTest(sampleStats1, sampleStats2), 1E-7);
assertTrue("two sample heteroscedastic t-test reject",
- testStatistic.tTest(sample1, sample2, 0.2, false));
+ testStatistic.tTest(sample1, sample2, 0.2));
assertTrue("two sample heteroscedastic t-test reject",
- testStatistic.tTest(sampleStats1, sampleStats2, 0.2, false));
+ testStatistic.tTest(sampleStats1, sampleStats2, 0.2));
assertTrue("two sample heteroscedastic t-test accept",
- !testStatistic.tTest(sample1, sample2, 0.1, false));
+ !testStatistic.tTest(sample1, sample2, 0.1));
assertTrue("two sample heteroscedastic t-test accept",
- !testStatistic.tTest(sampleStats1, sampleStats2, 0.1, false));
+ !testStatistic.tTest(sampleStats1, sampleStats2, 0.1));
try {
- testStatistic.tTest(sample1, sample2, .95, false);
+ testStatistic.tTest(sample1, sample2, .95);
fail("alpha out of range, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
- // exptected
+ // expected
}
try {
- testStatistic.tTest(sampleStats1, sampleStats2, .95, false);
+ testStatistic.tTest(sampleStats1, sampleStats2, .95);
fail("alpha out of range, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
- testStatistic.tTest(sample1, tooShortObs, .01, false);
+ testStatistic.tTest(sample1, tooShortObs, .01);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
- testStatistic.tTest(sampleStats1, tooShortStats, .01, false);
+ testStatistic.tTest(sampleStats1, tooShortStats, .01);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
- testStatistic.tTest(sample1, tooShortObs, false);
+ testStatistic.tTest(sample1, tooShortObs);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
- testStatistic.tTest(sampleStats1, tooShortStats, false);
+ testStatistic.tTest(sampleStats1, tooShortStats);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
- testStatistic.t(sample1, tooShortObs, false);
+ testStatistic.t(sample1, tooShortObs);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
- testStatistic.t(sampleStats1, tooShortStats, false);
+ testStatistic.t(sampleStats1, tooShortStats);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
@@ -252,13 +252,13 @@
// Target comparison values computed using R version 1.8.1 (Linux version)
assertEquals("two sample homoscedastic t stat", -1.120897,
- testStatistic.t(sample1, sample2, true), 10E-6);
+ testStatistic.homoscedasticT(sample1, sample2), 10E-6);
assertEquals("two sample homoscedastic p value", 0.2948490,
- testStatistic.tTest(sampleStats1, sampleStats2, true), 1E-6);
+ testStatistic.homoscedasticTTest(sampleStats1, sampleStats2), 1E-6);
assertTrue("two sample homoscedastic t-test reject",
- testStatistic.tTest(sample1, sample2, 0.3, true));
+ testStatistic.homoscedasticTTest(sample1, sample2, 0.3));
assertTrue("two sample homoscedastic t-test accept",
- !testStatistic.tTest(sample1, sample2, 0.2, true));
+ !testStatistic.homoscedasticTTest(sample1, sample2, 0.2));
}
public void testSmallSamples() throws Exception {
@@ -266,8 +266,8 @@
double[] sample2 = {4d, 5d};
// Target values computed using R, version 1.8.1 (linux version)
- assertEquals(-2.2361, testStatistic.t(sample1, sample2, false), 1E-4);
- assertEquals(0.1987, testStatistic.tTest(sample1, sample2, false), 1E-4);
+ assertEquals(-2.2361, testStatistic.t(sample1, sample2), 1E-4);
+ assertEquals(0.1987, testStatistic.tTest(sample1, sample2), 1E-4);
}
public void testPaired() throws Exception {
1.20 +11 -9 jakarta-commons/math/xdocs/userguide/stat.xml
Index: stat.xml
===================================================================
RCS file: /home/cvs/jakarta-commons/math/xdocs/userguide/stat.xml,v
retrieving revision 1.19
retrieving revision 1.20
diff -u -r1.19 -r1.20
--- stat.xml 23 Jun 2004 16:26:16 -0000 1.19
+++ stat.xml 2 Aug 2004 04:20:09 -0000 1.20
@@ -411,7 +411,10 @@
Welch-Satterwaite approximation is used to compute the degrees
of freedom. Methods to return t-statistics and p-values are provided in each
case, as well as boolean-valued methods to perform fixed significance
- level tests. See the examples below and the API documentation for
+ level tests. The names of methods or methods that assume equal
+ subpopulation variances always start with "homoscedastic." Test or
+ test-statistic methods that just start with "t" do not assume equal
+ variances. See the examples below and the API documentation for
more details.
+ *
+ * double-
valued methods return p-values;
+ * boolean-
valued methods perform fixed significance level tests.
+ * Significance levels are always specified as numbers between 0 and 0.5
+ * (e.g. tests at the 95% level use alpha=0.05
).
+ * double[]
arrays or
+ * {@link StatisticalSummary} instances.
+ *
*
* @version $Revision$ $Date$
*/
public interface TTest {
-
-
/**
* Computes a paired, 2-sample t-statistic based on the data in the input
* arrays. The t-statistic returned is equivalent to what would be returned by
@@ -46,13 +64,11 @@
* @throws MathException if the statistic can not be computed do to a
* convergence or other numerical error.
*/
- double pairedT(double[] sample1, double[] sample2)
- throws IllegalArgumentException, MathException;
-
+ public abstract double pairedT(double[] sample1, double[] sample2)
+ throws IllegalArgumentException, MathException;
/**
* Returns the observed significance level, or
- *
- * p-value, associated with a paired, two-sample, two-tailed t-test
+ * p-value, associated with a paired, two-sample, two-tailed t-test
* based on the data in the input arrays.
* sample1
and
* sample2
is 0 in favor of the two-sided alternative that the
* mean paired difference is not equal to 0, with significance level
@@ -118,9 +133,11 @@
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
- boolean pairedTTest(double[] sample1, double[] sample2, double alpha)
- throws IllegalArgumentException, MathException;
-
+ public abstract boolean pairedTTest(
+ double[] sample1,
+ double[] sample2,
+ double alpha)
+ throws IllegalArgumentException, MathException;
/**
* Computes a
* t statistic given observed values and a comparison constant.
@@ -136,9 +153,8 @@
* @return t statistic
* @throws IllegalArgumentException if input array length is less than 2
*/
- double t(double mu, double[] observed)
- throws IllegalArgumentException;
-
+ public abstract double t(double mu, double[] observed)
+ throws IllegalArgumentException;
/**
* Computes a
* t statistic to use in comparing the mean of the dataset described by
@@ -155,19 +171,19 @@
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
- double t(double mu, StatisticalSummary sampleStats)
- throws IllegalArgumentException;
-
+ public abstract double t(double mu, StatisticalSummary sampleStats)
+ throws IllegalArgumentException;
/**
- * Computes a
- * 2-sample t statistic.
+ * Computes a 2-sample t statistic, under the hypothesis of equal
+ * subpopulation variances. To compute a t-statistic without the
+ * equal variances hypothesis, use {@link #t(double[], double[])}.
* equalVariances
is true
, the t-statisitc is
+ * The t-statisitc is
* t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))
+ * t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))
* n1
is the size of first sample;
* n2
is the size of second sample;
@@ -181,9 +197,35 @@
* with var1
the variance of the first sample and
* var2
the variance of the second sample.
* equalVariances
is false
, the t-statisitc is
+ * Preconditions:
+ *
+ *
+ * @param sample1 array of sample data values
+ * @param sample2 array of sample data values
+ * @return t statistic
+ * @throws IllegalArgumentException if the precondition is not met
+ */
+ public abstract double homoscedasticT(double[] sample1, double[] sample2)
+ throws IllegalArgumentException;
+ /**
+ * Computes a 2-sample t statistic, without the hypothesis of equal
+ * subpopulation variances. To compute a t-statistic assuming equal
+ * variances, use {@link #homoscedasticT(double[], double[])}.
* t = (m1 - m2) / sqrt(var1/n1 + var2/n2)
+ * This statistic can be used to perform a two-sample t-test to compare
+ * sample means.
+ * t = (m1 - m2) / sqrt(var1/n1 + var2/n2)
+ * n1
is the size of the first sample
+ * n2
is the size of the second sample;
+ * m1
is the mean of the first sample;
+ * m2
is the mean of the second sample;
+ * var1
is the variance of the first sample;
+ * var2
is the variance of the second sample;
*
*
+ * equalVariances
is true
, the t-statisitc is
+ * The returned t-statisitc is
+ * t = (m1 - m2) / sqrt(var1/n1 + var2/n2)
* t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))
+ * where n1
is the size of the first sample;
+ * n2
is the size of the second sample;
+ * m1
is the mean of the first sample;
+ * m2
is the mean of the second sample
+ * var1
is the variance of the first sample;
+ * var2
is the variance of the second sample
+ *
+ *
+ *
+ * @param sampleStats1 StatisticalSummary describing data from the first sample
+ * @param sampleStats2 StatisticalSummary describing data from the second sample
+ * @return t statistic
+ * @throws IllegalArgumentException if the precondition is not met
+ */
+ public abstract double t(
+ StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2)
+ throws IllegalArgumentException;
+ /**
+ * Computes a 2-sample t statistic, comparing the means of the datasets
+ * described by two {@link StatisticalSummary} instances, under the
+ * assumption of equal subpopulation variances. To compute a t-statistic
+ * without the equal variances assumption, use
+ * {@link #t(StatisticalSummary, StatisticalSummary)}.
+ * t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))
* n1
is the size of first sample;
* n2
is the size of second sample;
* m1
is the mean of first sample;
- * m2
is the mean of second sample m2
is the mean of second sample
* and var
is the pooled variance estimate:
* var = sqrt(((n1 - 1)var1 + (n2 - 1)var2) / ((n1-1) + (n2-1)))
@@ -224,10 +298,6 @@
* with var1
the variance of the first sample and
* var2
the variance of the second sample.
* equalVariances
is false
, the t-statisitc is
- * t = (m1 - m2) / sqrt(var1/n1 + var2/n2)
- *
*
mu
.
* sample
is drawn equals mu
.
+ * which sample
is drawn equals mu
.
* true
iff the null hypothesis can be
* rejected with confidence 1 - alpha
. To
@@ -308,13 +375,11 @@
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error computing the p-value
*/
- boolean tTest(double mu, double[] sample, double alpha)
- throws IllegalArgumentException, MathException;
-
+ public abstract boolean tTest(double mu, double[] sample, double alpha)
+ throws IllegalArgumentException, MathException;
/**
* Returns the observed significance level, or
- *
- * p-value, associated with a one-sample, two-tailed t-test
+ * p-value, associated with a one-sample, two-tailed t-test
* comparing the mean of the dataset described by sampleStats
* with the constant mu
.
*
* The validity of the test depends on the assumptions of the parametric
* t-test procedure, as discussed
- * here
+ *
+ * here
*
*
stats
is drawn equals mu
.
- * true
iff the null hypothesis can be
- * rejected with confidence 1 - alpha
. To
- * perform a 1-sided test, use alpha / 2
+ * two-sided t-test evaluating the null hypothesis that the mean of the
+ * population from which the dataset described by stats
is
+ * drawn equals mu
.
+ * true
iff the null hypothesis can be rejected with
+ * confidence 1 - alpha
. To perform a 1-sided test, use
+ * alpha / 2.
*
*
sample mean = mu
at
@@ -377,13 +443,14 @@
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- boolean tTest(double mu, StatisticalSummary sampleStats, double alpha)
- throws IllegalArgumentException, MathException;
-
+ public abstract boolean tTest(
+ double mu,
+ StatisticalSummary sampleStats,
+ double alpha)
+ throws IllegalArgumentException, MathException;
/**
* Returns the observed significance level, or
- *
- * p-value, associated with a two-sample, two-tailed t-test
+ * p-value, associated with a two-sample, two-tailed t-test
* comparing the means of the input arrays.
* equalVariances
parameter is false,
- * the test does not assume that the underlying popuation variances are
+ * The test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
+ * sample data to compute the p-value. The t-statistic used is as defined in
+ * {@link #t(double[], double[])} and the Welch-Satterthwaite approximation
+ * to the degrees of freedom is used,
* as described
*
- * here.
+ * here. To perform the test under the assumption of equal subpopulation
+ * variances, use {@link #homoscedasticTTest(double[], double[])}.
+ *
+ * The validity of the p-value depends on the assumptions of the parametric
+ * t-test procedure, as discussed
+ *
+ * here
+ *
+ *
+ *
+ * @param sample1 array of sample data values
+ * @param sample2 array of sample data values
+ * @return p-value for t-test
+ * @throws IllegalArgumentException if the precondition is not met
+ * @throws MathException if an error occurs computing the p-value
+ */
+ public abstract double tTest(double[] sample1, double[] sample2)
+ throws IllegalArgumentException, MathException;
+ /**
+ * Returns the observed significance level, or
+ * p-value, associated with a two-sample, two-tailed t-test
+ * comparing the means of the input arrays, under the assumption that
+ * the two samples are drawn from subpopulations with equal variances.
+ * To perform the test without the equal variances assumption, use
+ * {@link #tTest(double[], double[])}.
+ * equalVariances
is true
, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * A pooled variance estimate is used to compute the t-statistic. See
+ * {@link #homoscedasticT(double[], double[])}. The sum of the sample sizes
+ * minus 2 is used as the degrees of freedom.
*
* The validity of the p-value depends on the assumptions of the parametric
@@ -417,47 +515,99 @@
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
- * @param equalVariances are sample variances assumed to be equal?
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- double tTest(double[] sample1, double[] sample2, boolean equalVariances)
- throws IllegalArgumentException, MathException;
-
+ public abstract double homoscedasticTTest(
+ double[] sample1,
+ double[] sample2)
+ throws IllegalArgumentException, MathException;
/**
- * Performs a
+ * Performs a
+ *
* two-sided t-test evaluating the null hypothesis that sample1
* and sample2
are drawn from populations with the same mean,
- * with significance level alpha
.
+ * with significance level alpha
. This test does not assume
+ * that the subpopulation variances are equal. To perform the test assuming
+ * equal variances, use
+ * {@link #homoscedasticTTest(double[], double[], double)}.
* true
iff the null hypothesis that the means are
* equal can be rejected with confidence 1 - alpha
. To
* perform a 1-sided test, use alpha / 2
* equalVariances
parameter is false,
- * the test does not assume that the underlying popuation variances are
- * equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
- * as described
+ * See {@link #t(double[], double[])} for the formula used to compute the
+ * t-statistic. Degrees of freedom are approximated using the
*
- * here.
+ * Welch-Satterthwaite approximation.
+
+ *
+ *
* mean 1 = mean 2
at
+ * the 95% level, use
+ * tTest(sample1, sample2, 0.05).
+ * mean 1 < mean 2
,
+ * first verify that the measured mean of sample 1
is less
+ * than the mean of sample 2
and then use
+ * tTest(sample1, sample2, 0.005)
+ * equalVariances
is true
, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * Usage Note:
+ * The validity of the test depends on the assumptions of the parametric
+ * t-test procedure, as discussed
+ *
+ * here
+ *
+ *
+ *
+ * @param sample1 array of sample data values
+ * @param sample2 array of sample data values
+ * @param alpha significance level of the test
+ * @return true if the null hypothesis can be rejected with
+ * confidence 1 - alpha
+ * @throws IllegalArgumentException if the preconditions are not met
+ * @throws MathException if an error occurs performing the test
+ */
+ public abstract boolean tTest(
+ double[] sample1,
+ double[] sample2,
+ double alpha)
+ throws IllegalArgumentException, MathException;
+ /**
+ * Performs a
+ *
+ * two-sided t-test evaluating the null hypothesis that 0 < alpha < 0.5
+ * sample1
+ * and sample2
are drawn from populations with the same mean,
+ * with significance level alpha
, assuming that the
+ * subpopulation variances are equal. Use
+ * {@link #tTest(double[], double[], double)} to perform the test without
+ * the assumption of equal variances.
+ * true
iff the null hypothesis that the means are
+ * equal can be rejected with confidence 1 - alpha
. To
+ * perform a 1-sided test, use alpha / 2.
To perform the test
+ * without the assumption of equal subpopulation variances, use
+ * {@link #tTest(double[], double[], double)}.
+ *
*
* mean 1 = mean 2
at
- * the 95% level, under the assumption of equal subpopulation variances,
- * use tTest(sample1, sample2, 0.05, true)
+ * the 95% level, use tTest(sample1, sample2, 0.05).
* mean 1 < mean 2
- * at the 99% level without assuming equal variances, first verify that the measured
- * mean of sample 1
is less than the mean of sample 2
- * and then use tTest(sample1, sample2, 0.005, false)
+ * mean 1 < mean 2,
+ * at the 99% level, first verify that the measured mean of
+ * sample 1
is less than the mean of sample 2
+ * and then use
+ * tTest(sample1, sample2, 0.005)
*
@@ -475,40 +625,70 @@
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @param alpha significance level of the test
- * @param equalVariances are sample variances assumed to be equal?
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
- boolean tTest(double[] sample1, double[] sample2, double alpha,
- boolean equalVariances)
- throws IllegalArgumentException, MathException;
-
+ public abstract boolean homoscedasticTTest(
+ double[] sample1,
+ double[] sample2,
+ double alpha)
+ throws IllegalArgumentException, MathException;
/**
* Returns the observed significance level, or
- *
- * p-value, associated with a two-sample, two-tailed t-test
- * comparing the means of the datasets described by two Univariates.
+ * p-value, associated with a two-sample, two-tailed t-test
+ * comparing the means of the datasets described by two StatisticalSummary
+ * instances.
* equalVariances
parameter is false,
- * the test does not assume that the underlying popuation variances are
+ * The test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
- * as described
- *
- * here.
+ * sample data to compute the p-value. To perform the test assuming
+ * equal variances, use
+ * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
+ *
+ * The validity of the p-value depends on the assumptions of the parametric
+ * t-test procedure, as discussed
+ *
+ * here
+ *
+ *
+ *
+ * @param sampleStats1 StatisticalSummary describing data from the first sample
+ * @param sampleStats2 StatisticalSummary describing data from the second sample
+ * @return p-value for t-test
+ * @throws IllegalArgumentException if the precondition is not met
+ * @throws MathException if an error occurs computing the p-value
+ */
+ public abstract double tTest(
+ StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2)
+ throws IllegalArgumentException, MathException;
+ /**
+ * Returns the observed significance level, or
+ * p-value, associated with a two-sample, two-tailed t-test
+ * comparing the means of the datasets described by two StatisticalSummary
+ * instances, under the hypothesis of equal subpopulation variances. To
+ * perform a test without the equal variances assumption, use
+ * {@link #tTest(StatisticalSummary, StatisticalSummary)}.
* equalVariances
is true
, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * The number returned is the smallest significance level
+ * at which one can reject the null hypothesis that the two means are
+ * equal in favor of the two-sided alternative that they are different.
+ * For a one-sided test, divide the returned value by 2.
+ *
* The validity of the p-value depends on the assumptions of the parametric
@@ -522,49 +702,44 @@
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
- * @param equalVariances are sample variances assumed to be equal?
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- double tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
- boolean equalVariances)
- throws IllegalArgumentException, MathException;
-
- /**
- * Performs a
- * two-sided t-test evaluating the null hypothesis that sampleStats1
- * and sampleStats2
describe datasets drawn from populations with the
- * same mean, with significance level alpha
.
+ public abstract double homoscedasticTTest(
+ StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2)
+ throws IllegalArgumentException, MathException;
+ /**
+ * Performs a
+ *
+ * two-sided t-test evaluating the null hypothesis that
+ * sampleStats1
and sampleStats2
describe
+ * datasets drawn from populations with the same mean, with significance
+ * level alpha
. This test does not assume that the
+ * subpopulation variances are equal. To perform the test under the equal
+ * variances assumption, use
+ * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
* true
iff the null hypothesis that the means are
* equal can be rejected with confidence 1 - alpha
. To
* perform a 1-sided test, use alpha / 2
* equalVariances
parameter is false,
- * the test does not assume that the underlying popuation variances are
- * equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
- * as described
+ * See {@link #t(double[], double[])} for the formula used to compute the
+ * t-statistic. Degrees of freedom are approximated using the
*
- * here.
- * equalVariances
is true
, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * Welch-Satterthwaite approximation.
*
*
* mean 1 = mean 2
at
- * the 95% level under the assumption of equal subpopulation variances, use
- * tTest(sampleStats1, sampleStats2, 0.05, true)
+ * the 95%, use
+ * tTest(sampleStats1, sampleStats2, 0.05)
* mean 1 < mean 2
- * at the 99% level without assuming that subpopulation variances are equal,
- * first verify that the measured mean of sample 1
is less than
- * the mean of sample 2
and then use
- * tTest(sampleStats1, sampleStats2, 0.005, false)
+ * at the 99% level, first verify that the measured mean of
+ * sample 1
is less than the mean of sample 2
+ * and then use
+ * tTest(sampleStats1, sampleStats2, 0.005)
*
@@ -583,13 +758,14 @@
* @param sampleStats1 StatisticalSummary describing sample data values
* @param sampleStats2 StatisticalSummary describing sample data values
* @param alpha significance level of the test
- * @param equalVariances are sample variances assumed to be equal?
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
- boolean tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
- double alpha, boolean equalVariances)
- throws IllegalArgumentException, MathException;
-}
+ public abstract boolean tTest(
+ StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2,
+ double alpha)
+ throws IllegalArgumentException, MathException;
+}
\ No newline at end of file
1.9 +395 -152 jakarta-commons/math/src/java/org/apache/commons/math/stat/inference/TTestImpl.java
Index: TTestImpl.java
===================================================================
RCS file: /home/cvs/jakarta-commons/math/src/java/org/apache/commons/math/stat/inference/TTestImpl.java,v
retrieving revision 1.8
retrieving revision 1.9
diff -u -r1.8 -r1.9
--- TTestImpl.java 23 Jun 2004 16:26:14 -0000 1.8
+++ TTestImpl.java 2 Aug 2004 04:20:08 -0000 1.9
@@ -23,6 +23,9 @@
/**
* Implements t-test statistics defined in the {@link TTest} interface.
+ * sample1
and
* sample2
is 0 in favor of the two-sided alternative that the
* mean paired difference is not equal to 0, with significance level
@@ -172,7 +174,8 @@
if ((observed == null) || (observed.length < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
- return t(StatUtils.mean(observed), mu, StatUtils.variance(observed), observed.length);
+ return t(StatUtils.mean(observed), mu, StatUtils.variance(observed),
+ observed.length);
}
/**
@@ -196,19 +199,21 @@
if ((sampleStats == null) || (sampleStats.getN() < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
- return t(sampleStats.getMean(), mu, sampleStats.getVariance(), sampleStats.getN());
+ return t(sampleStats.getMean(), mu, sampleStats.getVariance(),
+ sampleStats.getN());
}
/**
- * Computes a
- * 2-sample t statistic.
+ * Computes a 2-sample t statistic, under the hypothesis of equal
+ * subpopulation variances. To compute a t-statistic without the
+ * equal variances hypothesis, use {@link #t(double[], double[])}.
* equalVariances
is true
, the t-statisitc is
+ * The t-statisitc is
* t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))
+ * t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))
* n1
is the size of first sample;
* n2
is the size of second sample;
@@ -222,9 +227,44 @@
* with var1
the variance of the first sample and
* var2
the variance of the second sample.
* equalVariances
is false
, the t-statisitc is
+ * Preconditions:
+ *
+ *
+ * @param sample1 array of sample data values
+ * @param sample2 array of sample data values
+ * @return t statistic
+ * @throws IllegalArgumentException if the precondition is not met
+ */
+ public double homoscedasticT(double[] sample1, double[] sample2)
+ throws IllegalArgumentException {
+ if ((sample1 == null) || (sample2 == null ||
+ Math.min(sample1.length, sample2.length) < 2)) {
+ throw new IllegalArgumentException("insufficient data for t statistic");
+ }
+ return homoscedasticT(StatUtils.mean(sample1), StatUtils.mean(sample2),
+ StatUtils.variance(sample1), StatUtils.variance(sample2),
+ (double) sample1.length, (double) sample2.length);
+ }
+
+ /**
+ * Computes a 2-sample t statistic, without the hypothesis of equal
+ * subpopulation variances. To compute a t-statistic assuming equal
+ * variances, use {@link #homoscedasticT(double[], double[])}.
+ * t = (m1 - m2) / sqrt(var1/n1 + var2/n2)
+ * t = (m1 - m2) / sqrt(var1/n1 + var2/n2)
+ * n1
is the size of the first sample
+ * n2
is the size of the second sample;
+ * m1
is the mean of the first sample;
+ * m2
is the mean of the second sample;
+ * var1
is the variance of the first sample;
+ * var2
is the variance of the second sample;
*
*
+ * equalVariances
is true
, the t-statisitc is
+ * The returned t-statisitc is
+ * t = (m1 - m2) / sqrt(var1/n1 + var2/n2)
+ * n1
is the size of the first sample;
+ * n2
is the size of the second sample;
+ * m1
is the mean of the first sample;
+ * m2
is the mean of the second sample
+ * var1
is the variance of the first sample;
+ * var2
is the variance of the second sample
* t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))
+ * Preconditions:
+ *
+ *
+ * @param sampleStats1 StatisticalSummary describing data from the first sample
+ * @param sampleStats2 StatisticalSummary describing data from the second sample
+ * @return t statistic
+ * @throws IllegalArgumentException if the precondition is not met
+ */
+ public double t(StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2)
+ throws IllegalArgumentException {
+ if ((sampleStats1 == null) ||
+ (sampleStats2 == null ||
+ Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
+ throw new IllegalArgumentException("insufficient data for t statistic");
+ }
+ return t(sampleStats1.getMean(), sampleStats2.getMean(),
+ sampleStats1.getVariance(), sampleStats2.getVariance(),
+ (double) sampleStats1.getN(), (double) sampleStats2.getN());
+ }
+
+ /**
+ * Computes a 2-sample t statistic, comparing the means of the datasets
+ * described by two {@link StatisticalSummary} instances, under the
+ * assumption of equal subpopulation variances. To compute a t-statistic
+ * without the equal variances assumption, use
+ * {@link #t(StatisticalSummary, StatisticalSummary)}.
+ * t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))
* n1
is the size of first sample;
* n2
is the size of second sample;
* m1
is the mean of first sample;
- * m2
is the mean of second sample m2
is the mean of second sample
* and var
is the pooled variance estimate:
* var = sqrt(((n1 - 1)var1 + (n2 - 1)var2) / ((n1-1) + (n2-1)))
@@ -271,10 +355,6 @@
* with var1
the variance of the first sample and
* var2
the variance of the second sample.
* equalVariances
is false
, the t-statisitc is
- * t = (m1 - m2) / sqrt(var1/n1 + var2/n2)
- *
*
mu
.
* sample
is drawn equals mu
.
+ * which sample
is drawn equals mu
.
* true
iff the null hypothesis can be
* rejected with confidence 1 - alpha
. To
@@ -379,8 +458,7 @@
/**
* Returns the observed significance level, or
- *
- * p-value, associated with a one-sample, two-tailed t-test
+ * p-value, associated with a one-sample, two-tailed t-test
* comparing the mean of the dataset described by sampleStats
* with the constant mu
.
*
* The validity of the test depends on the assumptions of the parametric
* t-test procedure, as discussed
- * here
+ *
+ * here
*
*
stats
is drawn equals mu
.
- * true
iff the null hypothesis can be
- * rejected with confidence 1 - alpha
. To
- * perform a 1-sided test, use alpha / 2
+ * two-sided t-test evaluating the null hypothesis that the mean of the
+ * population from which the dataset described by stats
is
+ * drawn equals mu
.
+ * true
iff the null hypothesis can be rejected with
+ * confidence 1 - alpha
. To perform a 1-sided test, use
+ * alpha / 2.
*
*
sample mean = mu
at
@@ -448,7 +529,8 @@
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- public boolean tTest( double mu, StatisticalSummary sampleStats, double alpha)
+ public boolean tTest( double mu, StatisticalSummary sampleStats,
+ double alpha)
throws IllegalArgumentException, MathException {
if ((alpha <= 0) || (alpha > 0.5)) {
throw new IllegalArgumentException("bad significance level: " + alpha);
@@ -458,8 +540,7 @@
/**
* Returns the observed significance level, or
- *
- * p-value, associated with a two-sample, two-tailed t-test
+ * p-value, associated with a two-sample, two-tailed t-test
* comparing the means of the input arrays.
* equalVariances
parameter is false,
- * the test does not assume that the underlying popuation variances are
+ * The test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
+ * sample data to compute the p-value. The t-statistic used is as defined in
+ * {@link #t(double[], double[])} and the Welch-Satterthwaite approximation
+ * to the degrees of freedom is used,
* as described
*
- * here.
+ * here. To perform the test under the assumption of equal subpopulation
+ * variances, use {@link #homoscedasticTTest(double[], double[])}.
+ *
+ * The validity of the p-value depends on the assumptions of the parametric
+ * t-test procedure, as discussed
+ *
+ * here
+ *
+ *
+ *
+ * @param sample1 array of sample data values
+ * @param sample2 array of sample data values
+ * @return p-value for t-test
+ * @throws IllegalArgumentException if the precondition is not met
+ * @throws MathException if an error occurs computing the p-value
+ */
+ public double tTest(double[] sample1, double[] sample2)
+ throws IllegalArgumentException, MathException {
+ if ((sample1 == null) || (sample2 == null ||
+ Math.min(sample1.length, sample2.length) < 2)) {
+ throw new IllegalArgumentException("insufficient data");
+ }
+ return tTest(StatUtils.mean(sample1), StatUtils.mean(sample2),
+ StatUtils.variance(sample1), StatUtils.variance(sample2),
+ (double) sample1.length, (double) sample2.length);
+ }
+
+ /**
+ * Returns the observed significance level, or
+ * p-value, associated with a two-sample, two-tailed t-test
+ * comparing the means of the input arrays, under the assumption that
+ * the two samples are drawn from subpopulations with equal variances.
+ * To perform the test without the equal variances assumption, use
+ * {@link #tTest(double[], double[])}.
+ * equalVariances
is true
, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * A pooled variance estimate is used to compute the t-statistic. See
+ * {@link #homoscedasticT(double[], double[])}. The sum of the sample sizes
+ * minus 2 is used as the degrees of freedom.
*
* The validity of the p-value depends on the assumptions of the parametric
@@ -493,55 +614,112 @@
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
- * @param equalVariances are sample variances assumed to be equal?
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- public double tTest(double[] sample1, double[] sample2, boolean equalVariances)
+ public double homoscedasticTTest(double[] sample1, double[] sample2)
throws IllegalArgumentException, MathException {
if ((sample1 == null) || (sample2 == null ||
Math.min(sample1.length, sample2.length) < 2)) {
throw new IllegalArgumentException("insufficient data");
}
- return tTest(StatUtils.mean(sample1), StatUtils.mean(sample2), StatUtils.variance(sample1),
+ return homoscedasticTTest(StatUtils.mean(sample1),
+ StatUtils.mean(sample2), StatUtils.variance(sample1),
StatUtils.variance(sample2), (double) sample1.length,
- (double) sample2.length, equalVariances);
+ (double) sample2.length);
}
+
/**
- * Performs a
+ * Performs a
+ *
* two-sided t-test evaluating the null hypothesis that sample1
* and sample2
are drawn from populations with the same mean,
- * with significance level alpha
.
+ * with significance level alpha
. This test does not assume
+ * that the subpopulation variances are equal. To perform the test assuming
+ * equal variances, use
+ * {@link #homoscedasticTTest(double[], double[], double)}.
* true
iff the null hypothesis that the means are
* equal can be rejected with confidence 1 - alpha
. To
* perform a 1-sided test, use alpha / 2
* equalVariances
parameter is false,
- * the test does not assume that the underlying popuation variances are
- * equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
- * as described
+ * See {@link #t(double[], double[])} for the formula used to compute the
+ * t-statistic. Degrees of freedom are approximated using the
*
- * here.
+ * Welch-Satterthwaite approximation.
+
+ *
+ *
+ * mean 1 = mean 2
at
+ * the 95% level, use
+ * tTest(sample1, sample2, 0.05).
+ * mean 1 < mean 2
,
+ * first verify that the measured mean of sample 1
is less
+ * than the mean of sample 2
and then use
+ * tTest(sample1, sample2, 0.005)
+ *
+ * The validity of the test depends on the assumptions of the parametric
+ * t-test procedure, as discussed
+ *
+ * here
+ *
+ *
+ *
+ * @param sample1 array of sample data values
+ * @param sample2 array of sample data values
+ * @param alpha significance level of the test
+ * @return true if the null hypothesis can be rejected with
+ * confidence 1 - alpha
+ * @throws IllegalArgumentException if the preconditions are not met
+ * @throws MathException if an error occurs performing the test
+ */
+ public boolean tTest(double[] sample1, double[] sample2,
+ double alpha)
+ throws IllegalArgumentException, MathException {
+ if ((alpha <= 0) || (alpha > 0.5)) {
+ throw new IllegalArgumentException("bad significance level: " + alpha);
+ }
+ return (tTest(sample1, sample2) < alpha);
+ }
+
+ /**
+ * Performs a
+ *
+ * two-sided t-test evaluating the null hypothesis that 0 < alpha < 0.5
+ * sample1
+ * and sample2
are drawn from populations with the same mean,
+ * with significance level alpha
, assuming that the
+ * subpopulation variances are equal. Use
+ * {@link #tTest(double[], double[], double)} to perform the test without
+ * the assumption of equal variances.
* equalVariances
is true
, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * Returns true
iff the null hypothesis that the means are
+ * equal can be rejected with confidence 1 - alpha
. To
+ * perform a 1-sided test, use alpha / 2.
To perform the test
+ * without the assumption of equal subpopulation variances, use
+ * {@link #tTest(double[], double[], double)}.
+ *
*
* mean 1 = mean 2
at
- * the 95% level, under the assumption of equal subpopulation variances,
- * use tTest(sample1, sample2, 0.05, true)
+ * the 95% level, use tTest(sample1, sample2, 0.05).
* mean 1 < mean 2
- * at the 99% level without assuming equal variances, first verify that the measured
- * mean of sample 1
is less than the mean of sample 2
- * and then use tTest(sample1, sample2, 0.005, false)
+ * mean 1 < mean 2,
+ * at the 99% level, first verify that the measured mean of
+ * sample 1
is less than the mean of sample 2
+ * and then use
+ * tTest(sample1, sample2, 0.005)
*
@@ -559,45 +737,81 @@
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @param alpha significance level of the test
- * @param equalVariances are sample variances assumed to be equal?
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
- public boolean tTest(double[] sample1, double[] sample2, double alpha,
- boolean equalVariances)
+ public boolean homoscedasticTTest(double[] sample1, double[] sample2,
+ double alpha)
throws IllegalArgumentException, MathException {
if ((alpha <= 0) || (alpha > 0.5)) {
throw new IllegalArgumentException("bad significance level: " + alpha);
}
- return (tTest(sample1, sample2, equalVariances) < alpha);
+ return (homoscedasticTTest(sample1, sample2) < alpha);
}
/**
* Returns the observed significance level, or
- *
- * p-value, associated with a two-sample, two-tailed t-test
- * comparing the means of the datasets described by two Univariates.
+ * p-value, associated with a two-sample, two-tailed t-test
+ * comparing the means of the datasets described by two StatisticalSummary
+ * instances.
* equalVariances
parameter is false,
- * the test does not assume that the underlying popuation variances are
+ * The test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
- * as described
- *
- * here.
+ * sample data to compute the p-value. To perform the test assuming
+ * equal variances, use
+ * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
+ *
+ * The validity of the p-value depends on the assumptions of the parametric
+ * t-test procedure, as discussed
+ *
+ * here
+ *
+ *
+ *
+ * @param sampleStats1 StatisticalSummary describing data from the first sample
+ * @param sampleStats2 StatisticalSummary describing data from the second sample
+ * @return p-value for t-test
+ * @throws IllegalArgumentException if the precondition is not met
+ * @throws MathException if an error occurs computing the p-value
+ */
+ public double tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2)
+ throws IllegalArgumentException, MathException {
+ if ((sampleStats1 == null) || (sampleStats2 == null ||
+ Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
+ throw new IllegalArgumentException("insufficient data for t statistic");
+ }
+ return tTest(sampleStats1.getMean(), sampleStats2.getMean(), sampleStats1.getVariance(),
+ sampleStats2.getVariance(), (double) sampleStats1.getN(),
+ (double) sampleStats2.getN());
+ }
+
+ /**
+ * Returns the observed significance level, or
+ * p-value, associated with a two-sample, two-tailed t-test
+ * comparing the means of the datasets described by two StatisticalSummary
+ * instances, under the hypothesis of equal subpopulation variances. To
+ * perform a test without the equal variances assumption, use
+ * {@link #tTest(StatisticalSummary, StatisticalSummary)}.
+ * equalVariances
is true
, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * See {@link #homoscedasticT(double[], double[])} for the formula used to
+ * compute the t-statistic. The sum of the sample sizes minus 2 is used as
+ * the degrees of freedom.
*
* The validity of the p-value depends on the assumptions of the parametric
@@ -611,57 +825,53 @@
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
- * @param equalVariances are sample variances assumed to be equal?
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- public double tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
- boolean equalVariances)
+ public double homoscedasticTTest(StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2)
throws IllegalArgumentException, MathException {
if ((sampleStats1 == null) || (sampleStats2 == null ||
Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
- return tTest(sampleStats1.getMean(), sampleStats2.getMean(), sampleStats1.getVariance(),
+ return homoscedasticTTest(sampleStats1.getMean(),
+ sampleStats2.getMean(), sampleStats1.getVariance(),
sampleStats2.getVariance(), (double) sampleStats1.getN(),
- (double) sampleStats2.getN(), equalVariances);
+ (double) sampleStats2.getN());
}
/**
- * Performs a
- * two-sided t-test evaluating the null hypothesis that sampleStats1
- * and sampleStats2
describe datasets drawn from populations with the
- * same mean, with significance level alpha
.
+ * Performs a
+ *
+ * two-sided t-test evaluating the null hypothesis that
+ * sampleStats1
and sampleStats2
describe
+ * datasets drawn from populations with the same mean, with significance
+ * level alpha
. This test does not assume that the
+ * subpopulation variances are equal. To perform the test under the equal
+ * variances assumption, use
+ * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
* true
iff the null hypothesis that the means are
* equal can be rejected with confidence 1 - alpha
. To
* perform a 1-sided test, use alpha / 2
* equalVariances
parameter is false,
- * the test does not assume that the underlying popuation variances are
- * equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
- * as described
+ * See {@link #t(double[], double[])} for the formula used to compute the
+ * t-statistic. Degrees of freedom are approximated using the
*
- * here.
- * equalVariances
is true
, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * Welch-Satterthwaite approximation.
*
*
* mean 1 = mean 2
at
- * the 95% level under the assumption of equal subpopulation variances, use
- * tTest(sampleStats1, sampleStats2, 0.05, true)
+ * the 95%, use
+ * tTest(sampleStats1, sampleStats2, 0.05)
* mean 1 < mean 2
- * at the 99% level without assuming that subpopulation variances are equal,
- * first verify that the measured mean of sample 1
is less than
- * the mean of sample 2
and then use
- * tTest(sampleStats1, sampleStats2, 0.005, false)
+ * at the 99% level, first verify that the measured mean of
+ * sample 1
is less than the mean of sample 2
+ * and then use
+ * tTest(sampleStats1, sampleStats2, 0.005)
*
@@ -680,19 +890,18 @@
* @param sampleStats1 StatisticalSummary describing sample data values
* @param sampleStats2 StatisticalSummary describing sample data values
* @param alpha significance level of the test
- * @param equalVariances are sample variances assumed to be equal?
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
- public boolean tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
- double alpha, boolean equalVariances)
+ public boolean tTest(StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2, double alpha)
throws IllegalArgumentException, MathException {
if ((alpha <= 0) || (alpha > 0.5)) {
throw new IllegalArgumentException("bad significance level: " + alpha);
}
- return (tTest(sampleStats1, sampleStats2, equalVariances) < alpha);
+ return (tTest(sampleStats1, sampleStats2) < alpha);
}
//----------------------------------------------- Protected methods
@@ -738,8 +947,8 @@
/**
* Computes t test statistic for 2-sample t-test.
- * If equalVariance is true, the pooled variance
- * estimate is computed and used.
+ *
To compute the (one-sided) p-value:
To perform a fixed significance level test with alpha = .05:
- In each case above, the last (boolean) parameter determines
- whether or not the test should assume that subpopulation variances
- are equal. Replacing this with true
will result in
- homoscedastic (equal variances) tests / test statistics.
+ In each case above, the test does not assume that the subpopulation
+ variances are equal. To perform the tests under this assumption,
+ replace "t" at the beginning of the method name with "homoscedasticT"
chi-square
test statistics