commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pste...@apache.org
Subject cvs commit: jakarta-commons/math/xdocs/userguide stat.xml
Date Mon, 02 Aug 2004 04:20:09 GMT
psteitz     2004/08/01 21:20:09

  Modified:    math/src/java/org/apache/commons/math/stat/inference
                        TTest.java TTestImpl.java
               math/src/test/org/apache/commons/math/stat/inference
                        TTestTest.java
               math/xdocs/userguide stat.xml
  Log:
  Removed boolean equalVariances flag from t-test API.
  
  Revision  Changes    Path
  1.7       +337 -161  jakarta-commons/math/src/java/org/apache/commons/math/stat/inference/TTest.java
  
  Index: TTest.java
  ===================================================================
  RCS file: /home/cvs/jakarta-commons/math/src/java/org/apache/commons/math/stat/inference/TTest.java,v
  retrieving revision 1.6
  retrieving revision 1.7
  diff -u -r1.6 -r1.7
  --- TTest.java	23 Jun 2004 16:26:14 -0000	1.6
  +++ TTest.java	2 Aug 2004 04:20:08 -0000	1.7
  @@ -20,12 +20,30 @@
   
   /**
    * An interface for Student's t-tests.
  + * <p>
  + * Tests can be:<ul>
  + * <li>One-sample or two-sample</li>
  + * <li>One-sided or two-sided</li>
  + * <li>Paired or unpaired (for two-sample tests)</li>
  + * <li>Homoscedastic (equal variance assumption) or heteroscedastic
  + * (for two sample tests)</li>
  + * <li>Fixed significance level (boolean-valued) or returning p-values.
  + * </li></ul>
  + * <p>
  + * Test statistics are available for all tests.  Methods including "Test" in
  + * in their names perform tests, all other methods return t-statistics.  Among
  + * the "Test" methods, <code>double-</code>valued methods return p-values;
  + * <code>boolean-</code>valued methods perform fixed significance level tests.
  + * Significance levels are always specified as numbers between 0 and 0.5
  + * (e.g. tests at the 95% level  use <code>alpha=0.05</code>).
  + * <p>
  + * Input to tests can be either <code>double[]</code> arrays or 
  + * {@link StatisticalSummary} instances.
  + * 
    *
    * @version $Revision$ $Date$ 
    */
   public interface TTest {
  -    
  -    
       /**
        * Computes a paired, 2-sample t-statistic based on the data in the input 
        * arrays.  The t-statistic returned is equivalent to what would be returned by
  @@ -46,13 +64,11 @@
        * @throws MathException if the statistic can not be computed do to a
        *         convergence or other numerical error.
        */
  -    double pairedT(double[] sample1, double[] sample2) 
  -    throws IllegalArgumentException, MathException;
  -    
  +    public abstract double pairedT(double[] sample1, double[] sample2)
  +        throws IllegalArgumentException, MathException;
       /**
        * Returns the <i>observed significance level</i>, or 
  -     * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
  -     * p-value</a>, associated with a paired, two-sample, two-tailed t-test 
  +     * <i> p-value</i>, associated with a paired, two-sample, two-tailed t-test 
        * based on the data in the input arrays.
        * <p>
        * The number returned is the smallest significance level
  @@ -83,11 +99,10 @@
        * @throws IllegalArgumentException if the precondition is not met
        * @throws MathException if an error occurs computing the p-value
        */
  -    double pairedTTest(double[] sample1, double[] sample2)
  -    throws IllegalArgumentException, MathException;
  -    
  +    public abstract double pairedTTest(double[] sample1, double[] sample2)
  +        throws IllegalArgumentException, MathException;
       /**
  -     * Performs a paired t-test</a> evaluating the null hypothesis that the 
  +     * Performs a paired t-test evaluating the null hypothesis that the 
        * mean of the paired differences between <code>sample1</code> and
        * <code>sample2</code> is 0 in favor of the two-sided alternative that the 
        * mean paired difference is not equal to 0, with significance level 
  @@ -118,9 +133,11 @@
        * @throws IllegalArgumentException if the preconditions are not met
        * @throws MathException if an error occurs performing the test
        */
  -    boolean pairedTTest(double[] sample1, double[] sample2, double alpha)
  -    throws IllegalArgumentException, MathException;
  -    
  +    public abstract boolean pairedTTest(
  +        double[] sample1,
  +        double[] sample2,
  +        double alpha)
  +        throws IllegalArgumentException, MathException;
       /**
        * Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm#formula"> 
        * t statistic </a> given observed values and a comparison constant.
  @@ -136,9 +153,8 @@
        * @return t statistic
        * @throws IllegalArgumentException if input array length is less than 2
        */
  -    double t(double mu, double[] observed) 
  -    throws IllegalArgumentException;
  -    
  +    public abstract double t(double mu, double[] observed)
  +        throws IllegalArgumentException;
       /**
        * Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm#formula">
        * t statistic </a> to use in comparing the mean of the dataset described by 
  @@ -155,19 +171,19 @@
        * @return t statistic
        * @throws IllegalArgumentException if the precondition is not met
        */
  -    double t(double mu, StatisticalSummary sampleStats) 
  -    throws IllegalArgumentException;
  -    
  +    public abstract double t(double mu, StatisticalSummary sampleStats)
  +        throws IllegalArgumentException;
       /**
  -     * Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
  -     * 2-sample t statistic. </a>
  +     * Computes a 2-sample t statistic,  under the hypothesis of equal 
  +     * subpopulation variances.  To compute a t-statistic without the
  +     * equal variances hypothesis, use {@link #t(double[], double[])}.
        * <p>
  -     * This statistic can be used to perform a two-sample t-test to compare
  -     * sample means.
  +     * This statistic can be used to perform a (homoscedastic) two-sample
  +     * t-test to compare sample means.   
        * <p>
  -     * If <code>equalVariances</code> is <code>true</code>,  the t-statisitc is
  +     * The t-statisitc is
        * <p>
  -     * (1) &nbsp;&nbsp;<code>  t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
  +     * &nbsp;&nbsp;<code>  t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
        * <p>
        * where <strong><code>n1</code></strong> is the size of first sample; 
        * <strong><code> n2</code></strong> is the size of second sample; 
  @@ -181,9 +197,35 @@
        * with <strong><code>var1<code></strong> the variance of the first sample and
        * <strong><code>var2</code></strong> the variance of the second sample.
        * <p>
  -     * If <code>equalVariances</code> is <code>false</code>,  the t-statisitc is
  +     * <strong>Preconditions</strong>: <ul>
  +     * <li>The observed array lengths must both be at least 2.
  +     * </li></ul>
  +     *
  +     * @param sample1 array of sample data values
  +     * @param sample2 array of sample data values
  +     * @return t statistic
  +     * @throws IllegalArgumentException if the precondition is not met
  +     */
  +    public abstract double homoscedasticT(double[] sample1, double[] sample2)
  +        throws IllegalArgumentException;
  +    /**
  +     * Computes a 2-sample t statistic, without the hypothesis of equal
  +     * subpopulation variances.  To compute a t-statistic assuming equal
  +     * variances, use {@link #homoscedasticT(double[], double[])}.
        * <p>
  -     * (2) &nbsp;&nbsp; <code>  t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
  +     * This statistic can be used to perform a two-sample t-test to compare
  +     * sample means.
  +     * <p>
  +     * The t-statisitc is
  +     * <p>
  +     * &nbsp;&nbsp; <code>  t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
  +     * <p>
  +     *  where <strong><code>n1</code></strong> is the size of the first sample
  +     * <strong><code> n2</code></strong> is the size of the second sample; 
  +     * <strong><code> m1</code></strong> is the mean of the first sample;  
  +     * <strong><code> m2</code></strong> is the mean of the second sample;
  +     * <strong><code> var1</code></strong> is the variance of the first sample;
  +     * <strong><code> var2</code></strong> is the variance of the second sample;  
        * <p>
        * <strong>Preconditions</strong>: <ul>
        * <li>The observed array lengths must both be at least 2.
  @@ -191,32 +233,64 @@
        *
        * @param sample1 array of sample data values
        * @param sample2 array of sample data values
  -     * @param equalVariances are the sample variances assumed equal?
        * @return t statistic
        * @throws IllegalArgumentException if the precondition is not met
  -     * @throws MathException if the statistic can not be computed do to a
  -     *         convergence or other numerical error.
        */
  -    double t(double[] sample1, double[] sample2, boolean equalVariances) 
  -    throws IllegalArgumentException, MathException;
  -    
  +    public abstract double t(double[] sample1, double[] sample2)
  +        throws IllegalArgumentException;
       /**
  -     * Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
  -     * 2-sample t statistic </a>, comparing the means of the datasets described
  -     * by two {@link StatisticalSummary} instances.
  +     * Computes a 2-sample t statistic </a>, comparing the means of the datasets
  +     * described by two {@link StatisticalSummary} instances, without the
  +     * assumption of equal subpopulation variances.  Use 
  +     * {@link #homoscedasticT(StatisticalSummary, StatisticalSummary)} to
  +     * compute a t-statistic under the equal variances assumption.
        * <p>
        * This statistic can be used to perform a two-sample t-test to compare
        * sample means.
        * <p>
  -      * If <code>equalVariances</code> is <code>true</code>,  the t-statisitc is
  +      * The returned  t-statisitc is
  +     * <p>
  +     * &nbsp;&nbsp; <code>  t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
        * <p>
  -     * (1) &nbsp;&nbsp;<code>  t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
  +     * where <strong><code>n1</code></strong> is the size of the first sample; 
  +     * <strong><code> n2</code></strong> is the size of the second sample; 
  +     * <strong><code> m1</code></strong> is the mean of the first sample;  
  +     * <strong><code> m2</code></strong> is the mean of the second sample
  +     * <strong><code> var1</code></strong> is the variance of the first sample;  
  +     * <strong><code> var2</code></strong> is the variance of the second sample
  +     * <p>
  +     * <strong>Preconditions</strong>: <ul>
  +     * <li>The datasets described by the two Univariates must each contain
  +     * at least 2 observations.
  +     * </li></ul>
  +     *
  +     * @param sampleStats1 StatisticalSummary describing data from the first sample
  +     * @param sampleStats2 StatisticalSummary describing data from the second sample
  +     * @return t statistic
  +     * @throws IllegalArgumentException if the precondition is not met
  +     */
  +    public abstract double t(
  +        StatisticalSummary sampleStats1,
  +        StatisticalSummary sampleStats2)
  +        throws IllegalArgumentException;
  +    /**
  +     * Computes a 2-sample t statistic, comparing the means of the datasets
  +     * described by two {@link StatisticalSummary} instances, under the
  +     * assumption of equal subpopulation variances.  To compute a t-statistic
  +     * without the equal variances assumption, use 
  +     * {@link #t(StatisticalSummary, StatisticalSummary)}.
  +     * <p>
  +     * This statistic can be used to perform a (homoscedastic) two-sample
  +     * t-test to compare sample means.
  +     * <p>
  +     * The t-statisitc returned is
  +     * <p>
  +     * &nbsp;&nbsp;<code>  t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
        * <p>
        * where <strong><code>n1</code></strong> is the size of first sample; 
        * <strong><code> n2</code></strong> is the size of second sample; 
        * <strong><code> m1</code></strong> is the mean of first sample;  
  -     * <strong><code> m2</code></strong> is the mean of second sample</li>
  -     * </ul>
  +     * <strong><code> m2</code></strong> is the mean of second sample
        * and <strong><code>var</code></strong> is the pooled variance estimate:
        * <p>
        * <code>var = sqrt(((n1 - 1)var1 + (n2 - 1)var2) / ((n1-1) + (n2-1)))</code>
  @@ -224,10 +298,6 @@
        * with <strong><code>var1<code></strong> the variance of the first sample and
        * <strong><code>var2</code></strong> the variance of the second sample.
        * <p>
  -     * If <code>equalVariances</code> is <code>false</code>,  the t-statisitc is
  -     * <p>
  -     * (2) &nbsp;&nbsp; <code>  t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
  -     * <p>
        * <strong>Preconditions</strong>: <ul>
        * <li>The datasets described by the two Univariates must each contain
        * at least 2 observations.
  @@ -235,18 +305,16 @@
        *
        * @param sampleStats1 StatisticalSummary describing data from the first sample
        * @param sampleStats2 StatisticalSummary describing data from the second sample
  -     * @param equalVariances are the sample variances assumed equal?
        * @return t statistic
        * @throws IllegalArgumentException if the precondition is not met
        */
  -    double t(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
  -            boolean equalVariances) 
  -    throws IllegalArgumentException;
  -    
  +    public abstract double homoscedasticT(
  +        StatisticalSummary sampleStats1,
  +        StatisticalSummary sampleStats2)
  +        throws IllegalArgumentException;
       /**
        * Returns the <i>observed significance level</i>, or 
  -     * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
  -     * p-value</a>, associated with a one-sample, two-tailed t-test 
  +     * <i>p-value</i>, associated with a one-sample, two-tailed t-test 
        * comparing the mean of the input array with the constant <code>mu</code>.
        * <p>
        * The number returned is the smallest significance level
  @@ -270,13 +338,12 @@
        * @throws IllegalArgumentException if the precondition is not met
        * @throws MathException if an error occurs computing the p-value
        */
  -    double tTest(double mu, double[] sample)
  -    throws IllegalArgumentException, MathException;
  -    
  +    public abstract double tTest(double mu, double[] sample)
  +        throws IllegalArgumentException, MathException;
       /**
        * Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
        * two-sided t-test</a> evaluating the null hypothesis that the mean of the population from
  -     *  which <code>sample</code> is drawn equals <code>mu</code>.
  +     * which <code>sample</code> is drawn equals <code>mu</code>.
        * <p>
        * Returns <code>true</code> iff the null hypothesis can be 
        * rejected with confidence <code>1 - alpha</code>.  To 
  @@ -308,13 +375,11 @@
        * @throws IllegalArgumentException if the precondition is not met
        * @throws MathException if an error computing the p-value
        */
  -    boolean tTest(double mu, double[] sample, double alpha)
  -    throws IllegalArgumentException, MathException;
  -    
  +    public abstract boolean tTest(double mu, double[] sample, double alpha)
  +        throws IllegalArgumentException, MathException;
       /**
        * Returns the <i>observed significance level</i>, or 
  -     * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
  -     * p-value</a>, associated with a one-sample, two-tailed t-test 
  +     * <i>p-value</i>, associated with a one-sample, two-tailed t-test 
        * comparing the mean of the dataset described by <code>sampleStats</code>
        * with the constant <code>mu</code>.
        * <p>
  @@ -327,7 +392,8 @@
        * <strong>Usage Note:</strong><br>
        * The validity of the test depends on the assumptions of the parametric
        * t-test procedure, as discussed 
  -     * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">here</a>
  +     * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
  +     * here</a>
        * <p>
        * <strong>Preconditions</strong>: <ul>
        * <li>The sample must contain at least 2 observations.
  @@ -339,17 +405,17 @@
        * @throws IllegalArgumentException if the precondition is not met
        * @throws MathException if an error occurs computing the p-value
        */
  -    double tTest(double mu, StatisticalSummary sampleStats)
  -    throws IllegalArgumentException, MathException;
  -    
  +    public abstract double tTest(double mu, StatisticalSummary sampleStats)
  +        throws IllegalArgumentException, MathException;
       /**
        * Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
  -     * two-sided t-test</a> evaluating the null hypothesis that the mean of the population from
  -     * which the dataset described by <code>stats</code> is drawn equals <code>mu</code>.
  -     * <p>
  -     * Returns <code>true</code> iff the null hypothesis can be 
  -     * rejected with confidence <code>1 - alpha</code>.  To 
  -     * perform a 1-sided test, use <code>alpha / 2</code>
  +     * two-sided t-test</a> evaluating the null hypothesis that the mean of the
  +     * population from which the dataset described by <code>stats</code> is
  +     * drawn equals <code>mu</code>.
  +     * <p>
  +     * Returns <code>true</code> iff the null hypothesis can be rejected with
  +     * confidence <code>1 - alpha</code>.  To  perform a 1-sided test, use
  +     * <code>alpha / 2.</code>
        * <p>
        * <strong>Examples:</strong><br><ol>
        * <li>To test the (2-sided) hypothesis <code>sample mean = mu </code> at
  @@ -377,13 +443,14 @@
        * @throws IllegalArgumentException if the precondition is not met
        * @throws MathException if an error occurs computing the p-value
        */
  -    boolean tTest(double mu, StatisticalSummary sampleStats, double alpha)
  -    throws IllegalArgumentException, MathException;
  -    
  +    public abstract boolean tTest(
  +        double mu,
  +        StatisticalSummary sampleStats,
  +        double alpha)
  +        throws IllegalArgumentException, MathException;
       /**
        * Returns the <i>observed significance level</i>, or 
  -     * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
  -     * p-value</a>, associated with a two-sample, two-tailed t-test 
  +     * <i>p-value</i>, associated with a two-sample, two-tailed t-test 
        * comparing the means of the input arrays.
        * <p>
        * The number returned is the smallest significance level
  @@ -391,19 +458,50 @@
        * equal in favor of the two-sided alternative that they are different. 
        * For a one-sided test, divide the returned value by 2.
        * <p>
  -     * If the <code>equalVariances</code> parameter is <code>false,</code>
  -     * the test does not assume that the underlying popuation variances are
  +     * The test does not assume that the underlying popuation variances are
        * equal  and it uses approximated degrees of freedom computed from the 
  -     * sample data to compute the p-value.  In this case, formula (1) for the
  -     * {@link #t(double[], double[], boolean)} statistic is used
  -     * and the Welch-Satterthwaite approximation to the degrees of freedom is used, 
  +     * sample data to compute the p-value.  The t-statistic used is as defined in
  +     * {@link #t(double[], double[])} and the Welch-Satterthwaite approximation
  +     * to the degrees of freedom is used, 
        * as described 
        * <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
  -     * here.</a>
  +     * here.</a>  To perform the test under the assumption of equal subpopulation
  +     * variances, use {@link #homoscedasticTTest(double[], double[])}. 
  +     * <p>
  +     * <strong>Usage Note:</strong><br>
  +     * The validity of the p-value depends on the assumptions of the parametric
  +     * t-test procedure, as discussed 
  +     * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
  +     * here</a>
  +     * <p>
  +     * <strong>Preconditions</strong>: <ul>
  +     * <li>The observed array lengths must both be at least 2.
  +     * </li></ul>
  +     *
  +     * @param sample1 array of sample data values
  +     * @param sample2 array of sample data values
  +     * @return p-value for t-test
  +     * @throws IllegalArgumentException if the precondition is not met
  +     * @throws MathException if an error occurs computing the p-value
  +     */
  +    public abstract double tTest(double[] sample1, double[] sample2)
  +        throws IllegalArgumentException, MathException;
  +    /**
  +     * Returns the <i>observed significance level</i>, or 
  +     * <i>p-value</i>, associated with a two-sample, two-tailed t-test 
  +     * comparing the means of the input arrays, under the assumption that
  +     * the two samples are drawn from subpopulations with equal variances.
  +     * To perform the test without the equal variances assumption, use
  +     * {@link #tTest(double[], double[])}.
  +     * <p>
  +     * The number returned is the smallest significance level
  +     * at which one can reject the null hypothesis that the two means are
  +     * equal in favor of the two-sided alternative that they are different. 
  +     * For a one-sided test, divide the returned value by 2.
        * <p>
  -     * If <code>equalVariances</code> is <code>true</code>, a pooled variance
  -     * estimate is used to compute the t-statistic (formula (2)) and the sum of the 
  -     * sample sizes minus 2 is used as the degrees of freedom.
  +     * A pooled variance estimate is used to compute the t-statistic.  See
  +     * {@link #homoscedasticT(double[], double[])}. The sum of the sample sizes
  +     * minus 2 is used as the degrees of freedom.
        * <p>
        * <strong>Usage Note:</strong><br>
        * The validity of the p-value depends on the assumptions of the parametric
  @@ -417,47 +515,99 @@
        *
        * @param sample1 array of sample data values
        * @param sample2 array of sample data values
  -     * @param equalVariances are sample variances assumed to be equal?
        * @return p-value for t-test
        * @throws IllegalArgumentException if the precondition is not met
        * @throws MathException if an error occurs computing the p-value
        */
  -    double tTest(double[] sample1, double[] sample2, boolean equalVariances)
  -    throws IllegalArgumentException, MathException;
  -    
  +    public abstract double homoscedasticTTest(
  +        double[] sample1,
  +        double[] sample2)
  +        throws IllegalArgumentException, MathException;
       /**
  -     * Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
  +     * Performs a 
  +     * <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
        * two-sided t-test</a> evaluating the null hypothesis that <code>sample1</code> 
        * and <code>sample2</code> are drawn from populations with the same mean, 
  -     * with significance level <code>alpha</code>.
  +     * with significance level <code>alpha</code>.  This test does not assume
  +     * that the subpopulation variances are equal.  To perform the test assuming
  +     * equal variances, use 
  +     * {@link #homoscedasticTTest(double[], double[], double)}.
        * <p>
        * Returns <code>true</code> iff the null hypothesis that the means are
        * equal can be rejected with confidence <code>1 - alpha</code>.  To 
        * perform a 1-sided test, use <code>alpha / 2</code>
        * <p>
  -     * If the <code>equalVariances</code> parameter is <code>false,</code>
  -     * the test does not assume that the underlying popuation variances are
  -     * equal  and it uses approximated degrees of freedom computed from the 
  -     * sample data to compute the p-value.  In this case, formula (1) for the
  -     * {@link #t(double[], double[], boolean)} statistic is used
  -     * and the Welch-Satterthwaite approximation to the degrees of freedom is used, 
  -     * as described 
  +     * See {@link #t(double[], double[])} for the formula used to compute the
  +     * t-statistic.  Degrees of freedom are approximated using the
        * <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
  -     * here.</a>
  +     * Welch-Satterthwaite approximation.</a>
  +    
  +     * <p>
  +     * <strong>Examples:</strong><br><ol>
  +     * <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
  +     * the 95% level,  use 
  +     * <br><code>tTest(sample1, sample2, 0.05). </code>
  +     * </li>
  +     * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>,
  +     * first verify that the measured  mean of <code>sample 1</code> is less
  +     * than the mean of <code>sample 2</code> and then use 
  +     * <br><code>tTest(sample1, sample2, 0.005) </code>
  +     * </li></ol>
        * <p>
  -     * If <code>equalVariances</code> is <code>true</code>, a pooled variance
  -     * estimate is used to compute the t-statistic (formula (2)) and the sum of the 
  -     * sample sizes minus 2 is used as the degrees of freedom.
  +     * <strong>Usage Note:</strong><br>
  +     * The validity of the test depends on the assumptions of the parametric
  +     * t-test procedure, as discussed 
  +     * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
  +     * here</a>
  +     * <p>
  +     * <strong>Preconditions</strong>: <ul>
  +     * <li>The observed array lengths must both be at least 2.
  +     * </li>
  +     * <li> <code> 0 < alpha < 0.5 </code>
  +     * </li></ul>
  +     *
  +     * @param sample1 array of sample data values
  +     * @param sample2 array of sample data values
  +     * @param alpha significance level of the test
  +     * @return true if the null hypothesis can be rejected with 
  +     * confidence 1 - alpha
  +     * @throws IllegalArgumentException if the preconditions are not met
  +     * @throws MathException if an error occurs performing the test
  +     */
  +    public abstract boolean tTest(
  +        double[] sample1,
  +        double[] sample2,
  +        double alpha)
  +        throws IllegalArgumentException, MathException;
  +    /**
  +     * Performs a 
  +     * <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
  +     * two-sided t-test</a> evaluating the null hypothesis that <code>sample1</code> 
  +     * and <code>sample2</code> are drawn from populations with the same mean, 
  +     * with significance level <code>alpha</code>,  assuming that the
  +     * subpopulation variances are equal.  Use 
  +     * {@link #tTest(double[], double[], double)} to perform the test without
  +     * the assumption of equal variances.
  +     * <p>
  +     * Returns <code>true</code> iff the null hypothesis that the means are
  +     * equal can be rejected with confidence <code>1 - alpha</code>.  To 
  +     * perform a 1-sided test, use <code>alpha / 2.</code>  To perform the test
  +     * without the assumption of equal subpopulation variances, use 
  +     * {@link #tTest(double[], double[], double)}.
  +     * <p>
  +     * A pooled variance estimate is used to compute the t-statistic. See
  +     * {@link #t(double[], double[])} for the formula. The sum of the sample
  +     * sizes minus 2 is used as the degrees of freedom.
        * <p>
        * <strong>Examples:</strong><br><ol>
        * <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
  -     * the 95% level, under the assumption of equal subpopulation variances, 
  -     * use <br><code>tTest(sample1, sample2, 0.05, true) </code>
  +     * the 95% level, use <br><code>tTest(sample1, sample2, 0.05). </code>
        * </li>
  -     * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>
  -     * at the 99% level without assuming equal variances, first verify that the measured 
  -     * mean of <code>sample 1</code> is less than the mean of <code>sample 2</code>
  -     * and then use <br><code>tTest(sample1, sample2, 0.005, false) </code>
  +     * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2, </code>
  +     * at the 99% level, first verify that the measured mean of 
  +     * <code>sample 1</code> is less than the mean of <code>sample 2</code>
  +     * and then use
  +     * <br><code>tTest(sample1, sample2, 0.005) </code>
        * </li></ol>
        * <p>
        * <strong>Usage Note:</strong><br>
  @@ -475,40 +625,70 @@
        * @param sample1 array of sample data values
        * @param sample2 array of sample data values
        * @param alpha significance level of the test
  -     * @param equalVariances are sample variances assumed to be equal?
        * @return true if the null hypothesis can be rejected with 
        * confidence 1 - alpha
        * @throws IllegalArgumentException if the preconditions are not met
        * @throws MathException if an error occurs performing the test
        */
  -    boolean tTest(double[] sample1, double[] sample2, double alpha, 
  -            boolean equalVariances)
  -    throws IllegalArgumentException, MathException;
  -    
  +    public abstract boolean homoscedasticTTest(
  +        double[] sample1,
  +        double[] sample2,
  +        double alpha)
  +        throws IllegalArgumentException, MathException;
       /**
        * Returns the <i>observed significance level</i>, or 
  -     * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
  -     * p-value</a>, associated with a two-sample, two-tailed t-test 
  -     * comparing the means of the datasets described by two Univariates.
  +     * <i>p-value</i>, associated with a two-sample, two-tailed t-test 
  +     * comparing the means of the datasets described by two StatisticalSummary
  +     * instances.
        * <p>
        * The number returned is the smallest significance level
        * at which one can reject the null hypothesis that the two means are
        * equal in favor of the two-sided alternative that they are different. 
        * For a one-sided test, divide the returned value by 2.
        * <p>
  -     * If the <code>equalVariances</code> parameter is <code>false,</code>
  -     * the test does not assume that the underlying popuation variances are
  +     * The test does not assume that the underlying popuation variances are
        * equal  and it uses approximated degrees of freedom computed from the 
  -     * sample data to compute the p-value.  In this case, formula (1) for the
  -     * {@link #t(double[], double[], boolean)} statistic is used
  -     * and the Welch-Satterthwaite approximation to the degrees of freedom is used, 
  -     * as described 
  -     * <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
  -     * here.</a>
  +     * sample data to compute the p-value.   To perform the test assuming
  +     * equal variances, use 
  +     * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
  +     * <p>
  +     * <strong>Usage Note:</strong><br>
  +     * The validity of the p-value depends on the assumptions of the parametric
  +     * t-test procedure, as discussed 
  +     * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
  +     * here</a>
  +     * <p>
  +     * <strong>Preconditions</strong>: <ul>
  +     * <li>The datasets described by the two Univariates must each contain
  +     * at least 2 observations.
  +     * </li></ul>
  +     *
  +     * @param sampleStats1  StatisticalSummary describing data from the first sample
  +     * @param sampleStats2  StatisticalSummary describing data from the second sample
  +     * @return p-value for t-test
  +     * @throws IllegalArgumentException if the precondition is not met
  +     * @throws MathException if an error occurs computing the p-value
  +     */
  +    public abstract double tTest(
  +        StatisticalSummary sampleStats1,
  +        StatisticalSummary sampleStats2)
  +        throws IllegalArgumentException, MathException;
  +    /**
  +     * Returns the <i>observed significance level</i>, or 
  +     * <i>p-value</i>, associated with a two-sample, two-tailed t-test 
  +     * comparing the means of the datasets described by two StatisticalSummary
  +     * instances, under the hypothesis of equal subpopulation variances. To
  +     * perform a test without the equal variances assumption, use
  +     * {@link #tTest(StatisticalSummary, StatisticalSummary)}.
        * <p>
  -     * If <code>equalVariances</code> is <code>true</code>, a pooled variance
  -     * estimate is used to compute the t-statistic (formula (2)) and the sum of the 
  -     * sample sizes minus 2 is used as the degrees of freedom.
  +     * The number returned is the smallest significance level
  +     * at which one can reject the null hypothesis that the two means are
  +     * equal in favor of the two-sided alternative that they are different. 
  +     * For a one-sided test, divide the returned value by 2.
  +     * <p>
  +     * See {@link #homoscedasticT(double[], double[])} for the formula used to
  +     * compute the t-statistic. The sum of the  sample sizes minus 2 is used as
  +     * the degrees of freedom.
        * <p>
        * <strong>Usage Note:</strong><br>
        * The validity of the p-value depends on the assumptions of the parametric
  @@ -522,49 +702,44 @@
        *
        * @param sampleStats1  StatisticalSummary describing data from the first sample
        * @param sampleStats2  StatisticalSummary describing data from the second sample
  -     * @param equalVariances  are sample variances assumed to be equal?
        * @return p-value for t-test
        * @throws IllegalArgumentException if the precondition is not met
        * @throws MathException if an error occurs computing the p-value
        */
  -    double tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2, 
  -            boolean equalVariances)
  -    throws IllegalArgumentException, MathException;
  -    
  -    /**
  -     * Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
  -     * two-sided t-test</a> evaluating the null hypothesis that <code>sampleStats1</code>
  -     * and <code>sampleStats2</code> describe datasets drawn from populations with the 
  -     * same mean, with significance level <code>alpha</code>.
  +    public abstract double homoscedasticTTest(
  +        StatisticalSummary sampleStats1,
  +        StatisticalSummary sampleStats2)
  +        throws IllegalArgumentException, MathException;
  +    /**
  +     * Performs a 
  +     * <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
  +     * two-sided t-test</a> evaluating the null hypothesis that 
  +     * <code>sampleStats1</code> and <code>sampleStats2</code> describe
  +     * datasets drawn from populations with the same mean, with significance
  +     * level <code>alpha</code>.   This test does not assume that the
  +     * subpopulation variances are equal.  To perform the test under the equal
  +     * variances assumption, use
  +     * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
        * <p>
        * Returns <code>true</code> iff the null hypothesis that the means are
        * equal can be rejected with confidence <code>1 - alpha</code>.  To 
        * perform a 1-sided test, use <code>alpha / 2</code>
        * <p>
  -     * If the <code>equalVariances</code> parameter is <code>false,</code>
  -     * the test does not assume that the underlying popuation variances are
  -     * equal  and it uses approximated degrees of freedom computed from the 
  -     * sample data to compute the p-value.  In this case, formula (1) for the
  -     * {@link #t(double[], double[], boolean)} statistic is used
  -     * and the Welch-Satterthwaite approximation to the degrees of freedom is used, 
  -     * as described 
  +     * See {@link #t(double[], double[])} for the formula used to compute the
  +     * t-statistic.  Degrees of freedom are approximated using the
        * <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
  -     * here.</a>
  -     * <p>
  -     * If <code>equalVariances</code> is <code>true</code>, a pooled variance
  -     * estimate is used to compute the t-statistic (formula (2)) and the sum of the 
  -     * sample sizes minus 2 is used as the degrees of freedom.
  +     * Welch-Satterthwaite approximation.</a>
        * <p>
        * <strong>Examples:</strong><br><ol>
        * <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
  -     * the 95% level under the assumption of equal subpopulation variances, use 
  -     * <br><code>tTest(sampleStats1, sampleStats2, 0.05, true) </code>
  +     * the 95%, use 
  +     * <br><code>tTest(sampleStats1, sampleStats2, 0.05) </code>
        * </li>
        * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>
  -     * at the 99% level without assuming that subpopulation variances are equal, 
  -     * first verify that the measured mean of  <code>sample 1</code> is less than 
  -     * the mean of <code>sample 2</code> and then use 
  -     * <br><code>tTest(sampleStats1, sampleStats2, 0.005, false) </code>
  +     * at the 99% level,  first verify that the measured mean of  
  +     * <code>sample 1</code> is less than  the mean of <code>sample 2</code>
  +     * and then use 
  +     * <br><code>tTest(sampleStats1, sampleStats2, 0.005) </code>
        * </li></ol>
        * <p>
        * <strong>Usage Note:</strong><br>
  @@ -583,13 +758,14 @@
        * @param sampleStats1 StatisticalSummary describing sample data values
        * @param sampleStats2 StatisticalSummary describing sample data values
        * @param alpha significance level of the test
  -     * @param equalVariances  are sample variances assumed to be equal?
        * @return true if the null hypothesis can be rejected with 
        * confidence 1 - alpha
        * @throws IllegalArgumentException if the preconditions are not met
        * @throws MathException if an error occurs performing the test
        */
  -    boolean tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2, 
  -            double alpha, boolean equalVariances)
  -    throws IllegalArgumentException, MathException;
  -}
  +    public abstract boolean tTest(
  +        StatisticalSummary sampleStats1,
  +        StatisticalSummary sampleStats2,
  +        double alpha)
  +        throws IllegalArgumentException, MathException;
  +}
  \ No newline at end of file
  
  
  
  1.9       +395 -152  jakarta-commons/math/src/java/org/apache/commons/math/stat/inference/TTestImpl.java
  
  Index: TTestImpl.java
  ===================================================================
  RCS file: /home/cvs/jakarta-commons/math/src/java/org/apache/commons/math/stat/inference/TTestImpl.java,v
  retrieving revision 1.8
  retrieving revision 1.9
  diff -u -r1.8 -r1.9
  --- TTestImpl.java	23 Jun 2004 16:26:14 -0000	1.8
  +++ TTestImpl.java	2 Aug 2004 04:20:08 -0000	1.9
  @@ -23,6 +23,9 @@
   
   /**
    * Implements t-test statistics defined in the {@link TTest} interface.
  + * <p>
  + * Uses commons-math {@link org.apache.commons.math.distribution.TDistribution}
  + * implementation to estimate exact p-values.
    *
    * @version $Revision$ $Date$
    */
  @@ -72,8 +75,7 @@
   
        /**
        * Returns the <i>observed significance level</i>, or 
  -     * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
  -     * p-value</a>, associated with a paired, two-sample, two-tailed t-test 
  +     * <i> p-value</i>, associated with a paired, two-sample, two-tailed t-test 
        * based on the data in the input arrays.
        * <p>
        * The number returned is the smallest significance level
  @@ -113,7 +115,7 @@
       }
   
        /**
  -     * Performs a paired t-test</a> evaluating the null hypothesis that the 
  +     * Performs a paired t-test evaluating the null hypothesis that the 
        * mean of the paired differences between <code>sample1</code> and
        * <code>sample2</code> is 0 in favor of the two-sided alternative that the 
        * mean paired difference is not equal to 0, with significance level 
  @@ -172,7 +174,8 @@
           if ((observed == null) || (observed.length < 2)) {
               throw new IllegalArgumentException("insufficient data for t statistic");
           }
  -        return t(StatUtils.mean(observed), mu, StatUtils.variance(observed), observed.length);
  +        return t(StatUtils.mean(observed), mu, StatUtils.variance(observed),
  +                observed.length);
       }
   
       /**
  @@ -196,19 +199,21 @@
           if ((sampleStats == null) || (sampleStats.getN() < 2)) {
               throw new IllegalArgumentException("insufficient data for t statistic");
           }
  -        return t(sampleStats.getMean(), mu, sampleStats.getVariance(), sampleStats.getN());
  +        return t(sampleStats.getMean(), mu, sampleStats.getVariance(),
  +                sampleStats.getN());
       }
   
       /**
  -     * Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
  -     * 2-sample t statistic. </a>
  +     * Computes a 2-sample t statistic,  under the hypothesis of equal 
  +     * subpopulation variances.  To compute a t-statistic without the
  +     * equal variances hypothesis, use {@link #t(double[], double[])}.
        * <p>
  -     * This statistic can be used to perform a two-sample t-test to compare
  -     * sample means.
  +     * This statistic can be used to perform a (homoscedastic) two-sample
  +     * t-test to compare sample means.   
        * <p>
  -     * If <code>equalVariances</code> is <code>true</code>,  the t-statisitc is
  +     * The t-statisitc is
        * <p>
  -     * (1) &nbsp;&nbsp;<code>  t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
  +     * &nbsp;&nbsp;<code>  t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
        * <p>
        * where <strong><code>n1</code></strong> is the size of first sample; 
        * <strong><code> n2</code></strong> is the size of second sample; 
  @@ -222,9 +227,44 @@
        * with <strong><code>var1<code></strong> the variance of the first sample and
        * <strong><code>var2</code></strong> the variance of the second sample.
        * <p>
  -     * If <code>equalVariances</code> is <code>false</code>,  the t-statisitc is
  +     * <strong>Preconditions</strong>: <ul>
  +     * <li>The observed array lengths must both be at least 2.
  +     * </li></ul>
  +     *
  +     * @param sample1 array of sample data values
  +     * @param sample2 array of sample data values
  +     * @return t statistic
  +     * @throws IllegalArgumentException if the precondition is not met
  +     */
  +    public double homoscedasticT(double[] sample1, double[] sample2)
  +    throws IllegalArgumentException {
  +        if ((sample1 == null) || (sample2 == null ||
  +                Math.min(sample1.length, sample2.length) < 2)) {
  +            throw new IllegalArgumentException("insufficient data for t statistic");
  +        }
  +        return homoscedasticT(StatUtils.mean(sample1), StatUtils.mean(sample2),
  +                StatUtils.variance(sample1), StatUtils.variance(sample2),
  +                (double) sample1.length, (double) sample2.length);
  +    }
  +    
  +    /**
  +     * Computes a 2-sample t statistic, without the hypothesis of equal
  +     * subpopulation variances.  To compute a t-statistic assuming equal
  +     * variances, use {@link #homoscedasticT(double[], double[])}.
  +     * <p>
  +     * This statistic can be used to perform a two-sample t-test to compare
  +     * sample means.
  +     * <p>
  +     * The t-statisitc is
        * <p>
  -     * (2) &nbsp;&nbsp; <code>  t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
  +     * &nbsp;&nbsp; <code>  t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
  +     * <p>
  +     *  where <strong><code>n1</code></strong> is the size of the first sample
  +     * <strong><code> n2</code></strong> is the size of the second sample; 
  +     * <strong><code> m1</code></strong> is the mean of the first sample;  
  +     * <strong><code> m2</code></strong> is the mean of the second sample;
  +     * <strong><code> var1</code></strong> is the variance of the first sample;
  +     * <strong><code> var2</code></strong> is the variance of the second sample;  
        * <p>
        * <strong>Preconditions</strong>: <ul>
        * <li>The observed array lengths must both be at least 2.
  @@ -232,38 +272,82 @@
        *
        * @param sample1 array of sample data values
        * @param sample2 array of sample data values
  -     * @param equalVariances are the sample variances assumed equal?
        * @return t statistic
        * @throws IllegalArgumentException if the precondition is not met
        */
  -    public double t(double[] sample1, double[] sample2, boolean equalVariances)
  +    public double t(double[] sample1, double[] sample2)
       throws IllegalArgumentException {
           if ((sample1 == null) || (sample2 == null ||
                   Math.min(sample1.length, sample2.length) < 2)) {
               throw new IllegalArgumentException("insufficient data for t statistic");
           }
  -        return t(StatUtils.mean(sample1), StatUtils.mean(sample2), StatUtils.variance(sample1),
  -                StatUtils.variance(sample2),  (double) sample1.length, 
  -                (double) sample2.length, equalVariances);
  +        return t(StatUtils.mean(sample1), StatUtils.mean(sample2),
  +                StatUtils.variance(sample1), StatUtils.variance(sample2),
  +                (double) sample1.length, (double) sample2.length);
       }
   
       /**
  -     * Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
  -     * 2-sample t statistic </a>, comparing the means of the datasets described
  -     * by two {@link StatisticalSummary} instances.
  +     * Computes a 2-sample t statistic </a>, comparing the means of the datasets
  +     * described by two {@link StatisticalSummary} instances, without the
  +     * assumption of equal subpopulation variances.  Use 
  +     * {@link #homoscedasticT(StatisticalSummary, StatisticalSummary)} to
  +     * compute a t-statistic under the equal variances assumption.
        * <p>
        * This statistic can be used to perform a two-sample t-test to compare
        * sample means.
        * <p>
  -      * If <code>equalVariances</code> is <code>true</code>,  the t-statisitc is
  +      * The returned  t-statisitc is
  +     * <p>
  +     * &nbsp;&nbsp; <code>  t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
  +     * <p>
  +     * where <strong><code>n1</code></strong> is the size of the first sample; 
  +     * <strong><code> n2</code></strong> is the size of the second sample; 
  +     * <strong><code> m1</code></strong> is the mean of the first sample;  
  +     * <strong><code> m2</code></strong> is the mean of the second sample
  +     * <strong><code> var1</code></strong> is the variance of the first sample;  
  +     * <strong><code> var2</code></strong> is the variance of the second sample
        * <p>
  -     * (1) &nbsp;&nbsp;<code>  t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
  +     * <strong>Preconditions</strong>: <ul>
  +     * <li>The datasets described by the two Univariates must each contain
  +     * at least 2 observations.
  +     * </li></ul>
  +     *
  +     * @param sampleStats1 StatisticalSummary describing data from the first sample
  +     * @param sampleStats2 StatisticalSummary describing data from the second sample
  +     * @return t statistic
  +     * @throws IllegalArgumentException if the precondition is not met
  +     */
  +    public double t(StatisticalSummary sampleStats1, 
  +            StatisticalSummary sampleStats2)
  +    throws IllegalArgumentException {
  +        if ((sampleStats1 == null) ||
  +                (sampleStats2 == null ||
  +                        Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
  +            throw new IllegalArgumentException("insufficient data for t statistic");
  +        }
  +        return t(sampleStats1.getMean(), sampleStats2.getMean(), 
  +                sampleStats1.getVariance(), sampleStats2.getVariance(),
  +                (double) sampleStats1.getN(), (double) sampleStats2.getN());
  +    }
  +    
  +    /**
  +     * Computes a 2-sample t statistic, comparing the means of the datasets
  +     * described by two {@link StatisticalSummary} instances, under the
  +     * assumption of equal subpopulation variances.  To compute a t-statistic
  +     * without the equal variances assumption, use 
  +     * {@link #t(StatisticalSummary, StatisticalSummary)}.
  +     * <p>
  +     * This statistic can be used to perform a (homoscedastic) two-sample
  +     * t-test to compare sample means.
  +     * <p>
  +     * The t-statisitc returned is
  +     * <p>
  +     * &nbsp;&nbsp;<code>  t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
        * <p>
        * where <strong><code>n1</code></strong> is the size of first sample; 
        * <strong><code> n2</code></strong> is the size of second sample; 
        * <strong><code> m1</code></strong> is the mean of first sample;  
  -     * <strong><code> m2</code></strong> is the mean of second sample</li>
  -     * </ul>
  +     * <strong><code> m2</code></strong> is the mean of second sample
        * and <strong><code>var</code></strong> is the pooled variance estimate:
        * <p>
        * <code>var = sqrt(((n1 - 1)var1 + (n2 - 1)var2) / ((n1-1) + (n2-1)))</code>
  @@ -271,10 +355,6 @@
        * with <strong><code>var1<code></strong> the variance of the first sample and
        * <strong><code>var2</code></strong> the variance of the second sample.
        * <p>
  -     * If <code>equalVariances</code> is <code>false</code>,  the t-statisitc is
  -     * <p>
  -     * (2) &nbsp;&nbsp; <code>  t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
  -     * <p>
        * <strong>Preconditions</strong>: <ul>
        * <li>The datasets described by the two Univariates must each contain
        * at least 2 observations.
  @@ -282,27 +362,25 @@
        *
        * @param sampleStats1 StatisticalSummary describing data from the first sample
        * @param sampleStats2 StatisticalSummary describing data from the second sample
  -     * @param equalVariances are the sample variances assumed equal?
        * @return t statistic
        * @throws IllegalArgumentException if the precondition is not met
        */
  -    public double t(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2, 
  -            boolean equalVariances)
  +    public double homoscedasticT(StatisticalSummary sampleStats1, 
  +            StatisticalSummary sampleStats2)
       throws IllegalArgumentException {
           if ((sampleStats1 == null) ||
                   (sampleStats2 == null ||
                           Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
               throw new IllegalArgumentException("insufficient data for t statistic");
           }
  -        return t(sampleStats1.getMean(), sampleStats2.getMean(), sampleStats1.getVariance(),
  -                sampleStats2.getVariance(), (double) sampleStats1.getN(), 
  -                (double) sampleStats2.getN(), equalVariances);
  +        return homoscedasticT(sampleStats1.getMean(), sampleStats2.getMean(), 
  +                sampleStats1.getVariance(), sampleStats2.getVariance(), 
  +                (double) sampleStats1.getN(), (double) sampleStats2.getN());
       }
   
        /**
        * Returns the <i>observed significance level</i>, or 
  -     * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
  -     * p-value</a>, associated with a one-sample, two-tailed t-test 
  +     * <i>p-value</i>, associated with a one-sample, two-tailed t-test 
        * comparing the mean of the input array with the constant <code>mu</code>.
        * <p>
        * The number returned is the smallest significance level
  @@ -331,13 +409,14 @@
           if ((sample == null) || (sample.length < 2)) {
               throw new IllegalArgumentException("insufficient data for t statistic");
           }
  -        return tTest( StatUtils.mean(sample), mu, StatUtils.variance(sample), sample.length);
  +        return tTest( StatUtils.mean(sample), mu, StatUtils.variance(sample),
  +                sample.length);
       }
   
       /**
        * Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
        * two-sided t-test</a> evaluating the null hypothesis that the mean of the population from
  -     *  which <code>sample</code> is drawn equals <code>mu</code>.
  +     * which <code>sample</code> is drawn equals <code>mu</code>.
        * <p>
        * Returns <code>true</code> iff the null hypothesis can be 
        * rejected with confidence <code>1 - alpha</code>.  To 
  @@ -379,8 +458,7 @@
   
       /**
        * Returns the <i>observed significance level</i>, or 
  -     * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
  -     * p-value</a>, associated with a one-sample, two-tailed t-test 
  +     * <i>p-value</i>, associated with a one-sample, two-tailed t-test 
        * comparing the mean of the dataset described by <code>sampleStats</code>
        * with the constant <code>mu</code>.
        * <p>
  @@ -393,7 +471,8 @@
        * <strong>Usage Note:</strong><br>
        * The validity of the test depends on the assumptions of the parametric
        * t-test procedure, as discussed 
  -     * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">here</a>
  +     * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
  +     * here</a>
        * <p>
        * <strong>Preconditions</strong>: <ul>
        * <li>The sample must contain at least 2 observations.
  @@ -410,17 +489,19 @@
           if ((sampleStats == null) || (sampleStats.getN() < 2)) {
               throw new IllegalArgumentException("insufficient data for t statistic");
           }
  -        return tTest(sampleStats.getMean(), mu, sampleStats.getVariance(), sampleStats.getN());
  +        return tTest(sampleStats.getMean(), mu, sampleStats.getVariance(),
  +                sampleStats.getN());
       }
   
        /**
        * Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
  -     * two-sided t-test</a> evaluating the null hypothesis that the mean of the population from
  -     * which the dataset described by <code>stats</code> is drawn equals <code>mu</code>.
  -     * <p>
  -     * Returns <code>true</code> iff the null hypothesis can be 
  -     * rejected with confidence <code>1 - alpha</code>.  To 
  -     * perform a 1-sided test, use <code>alpha / 2</code>
  +     * two-sided t-test</a> evaluating the null hypothesis that the mean of the
  +     * population from which the dataset described by <code>stats</code> is
  +     * drawn equals <code>mu</code>.
  +     * <p>
  +     * Returns <code>true</code> iff the null hypothesis can be rejected with
  +     * confidence <code>1 - alpha</code>.  To  perform a 1-sided test, use
  +     * <code>alpha / 2.</code>
        * <p>
        * <strong>Examples:</strong><br><ol>
        * <li>To test the (2-sided) hypothesis <code>sample mean = mu </code> at
  @@ -448,7 +529,8 @@
        * @throws IllegalArgumentException if the precondition is not met
        * @throws MathException if an error occurs computing the p-value
        */
  -    public boolean tTest( double mu, StatisticalSummary sampleStats, double alpha)
  +    public boolean tTest( double mu, StatisticalSummary sampleStats,
  +            double alpha)
       throws IllegalArgumentException, MathException {
           if ((alpha <= 0) || (alpha > 0.5)) {
               throw new IllegalArgumentException("bad significance level: " + alpha);
  @@ -458,8 +540,7 @@
   
       /**
        * Returns the <i>observed significance level</i>, or 
  -     * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
  -     * p-value</a>, associated with a two-sample, two-tailed t-test 
  +     * <i>p-value</i>, associated with a two-sample, two-tailed t-test 
        * comparing the means of the input arrays.
        * <p>
        * The number returned is the smallest significance level
  @@ -467,19 +548,59 @@
        * equal in favor of the two-sided alternative that they are different. 
        * For a one-sided test, divide the returned value by 2.
        * <p>
  -     * If the <code>equalVariances</code> parameter is <code>false,</code>
  -     * the test does not assume that the underlying popuation variances are
  +     * The test does not assume that the underlying popuation variances are
        * equal  and it uses approximated degrees of freedom computed from the 
  -     * sample data to compute the p-value.  In this case, formula (1) for the
  -     * {@link #t(double[], double[], boolean)} statistic is used
  -     * and the Welch-Satterthwaite approximation to the degrees of freedom is used, 
  +     * sample data to compute the p-value.  The t-statistic used is as defined in
  +     * {@link #t(double[], double[])} and the Welch-Satterthwaite approximation
  +     * to the degrees of freedom is used, 
        * as described 
        * <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
  -     * here.</a>
  +     * here.</a>  To perform the test under the assumption of equal subpopulation
  +     * variances, use {@link #homoscedasticTTest(double[], double[])}. 
  +     * <p>
  +     * <strong>Usage Note:</strong><br>
  +     * The validity of the p-value depends on the assumptions of the parametric
  +     * t-test procedure, as discussed 
  +     * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
  +     * here</a>
  +     * <p>
  +     * <strong>Preconditions</strong>: <ul>
  +     * <li>The observed array lengths must both be at least 2.
  +     * </li></ul>
  +     *
  +     * @param sample1 array of sample data values
  +     * @param sample2 array of sample data values
  +     * @return p-value for t-test
  +     * @throws IllegalArgumentException if the precondition is not met
  +     * @throws MathException if an error occurs computing the p-value
  +     */
  +    public double tTest(double[] sample1, double[] sample2)
  +    throws IllegalArgumentException, MathException {
  +        if ((sample1 == null) || (sample2 == null ||
  +                Math.min(sample1.length, sample2.length) < 2)) {
  +            throw new IllegalArgumentException("insufficient data");
  +        }
  +        return tTest(StatUtils.mean(sample1), StatUtils.mean(sample2),
  +                StatUtils.variance(sample1), StatUtils.variance(sample2),
  +                (double) sample1.length, (double) sample2.length);
  +    }
  +    
  +    /**
  +     * Returns the <i>observed significance level</i>, or 
  +     * <i>p-value</i>, associated with a two-sample, two-tailed t-test 
  +     * comparing the means of the input arrays, under the assumption that
  +     * the two samples are drawn from subpopulations with equal variances.
  +     * To perform the test without the equal variances assumption, use
  +     * {@link #tTest(double[], double[])}.
  +     * <p>
  +     * The number returned is the smallest significance level
  +     * at which one can reject the null hypothesis that the two means are
  +     * equal in favor of the two-sided alternative that they are different. 
  +     * For a one-sided test, divide the returned value by 2.
        * <p>
  -     * If <code>equalVariances</code> is <code>true</code>, a pooled variance
  -     * estimate is used to compute the t-statistic (formula (2)) and the sum of the 
  -     * sample sizes minus 2 is used as the degrees of freedom.
  +     * A pooled variance estimate is used to compute the t-statistic.  See
  +     * {@link #homoscedasticT(double[], double[])}. The sum of the sample sizes
  +     * minus 2 is used as the degrees of freedom.
        * <p>
        * <strong>Usage Note:</strong><br>
        * The validity of the p-value depends on the assumptions of the parametric
  @@ -493,55 +614,112 @@
        *
        * @param sample1 array of sample data values
        * @param sample2 array of sample data values
  -     * @param equalVariances are sample variances assumed to be equal?
        * @return p-value for t-test
        * @throws IllegalArgumentException if the precondition is not met
        * @throws MathException if an error occurs computing the p-value
        */
  -    public double tTest(double[] sample1, double[] sample2, boolean equalVariances)
  +    public double homoscedasticTTest(double[] sample1, double[] sample2)
       throws IllegalArgumentException, MathException {
           if ((sample1 == null) || (sample2 == null ||
                   Math.min(sample1.length, sample2.length) < 2)) {
               throw new IllegalArgumentException("insufficient data");
           }
  -        return tTest(StatUtils.mean(sample1), StatUtils.mean(sample2), StatUtils.variance(sample1),
  +        return homoscedasticTTest(StatUtils.mean(sample1), 
  +                StatUtils.mean(sample2), StatUtils.variance(sample1),
                   StatUtils.variance(sample2), (double) sample1.length, 
  -                (double) sample2.length, equalVariances);
  +                (double) sample2.length);
       }
  +    
   
        /**
  -     * Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
  +     * Performs a 
  +     * <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
        * two-sided t-test</a> evaluating the null hypothesis that <code>sample1</code> 
        * and <code>sample2</code> are drawn from populations with the same mean, 
  -     * with significance level <code>alpha</code>.
  +     * with significance level <code>alpha</code>.  This test does not assume
  +     * that the subpopulation variances are equal.  To perform the test assuming
  +     * equal variances, use 
  +     * {@link #homoscedasticTTest(double[], double[], double)}.
        * <p>
        * Returns <code>true</code> iff the null hypothesis that the means are
        * equal can be rejected with confidence <code>1 - alpha</code>.  To 
        * perform a 1-sided test, use <code>alpha / 2</code>
        * <p>
  -     * If the <code>equalVariances</code> parameter is <code>false,</code>
  -     * the test does not assume that the underlying popuation variances are
  -     * equal  and it uses approximated degrees of freedom computed from the 
  -     * sample data to compute the p-value.  In this case, formula (1) for the
  -     * {@link #t(double[], double[], boolean)} statistic is used
  -     * and the Welch-Satterthwaite approximation to the degrees of freedom is used, 
  -     * as described 
  +     * See {@link #t(double[], double[])} for the formula used to compute the
  +     * t-statistic.  Degrees of freedom are approximated using the
        * <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
  -     * here.</a>
  +     * Welch-Satterthwaite approximation.</a>
  +      
  +     * <p>
  +     * <strong>Examples:</strong><br><ol>
  +     * <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
  +     * the 95% level,  use 
  +     * <br><code>tTest(sample1, sample2, 0.05). </code>
  +     * </li>
  +     * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>,
  +     * first verify that the measured  mean of <code>sample 1</code> is less
  +     * than the mean of <code>sample 2</code> and then use 
  +     * <br><code>tTest(sample1, sample2, 0.005) </code>
  +     * </li></ol>
  +     * <p>
  +     * <strong>Usage Note:</strong><br>
  +     * The validity of the test depends on the assumptions of the parametric
  +     * t-test procedure, as discussed 
  +     * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
  +     * here</a>
  +     * <p>
  +     * <strong>Preconditions</strong>: <ul>
  +     * <li>The observed array lengths must both be at least 2.
  +     * </li>
  +     * <li> <code> 0 < alpha < 0.5 </code>
  +     * </li></ul>
  +     *
  +     * @param sample1 array of sample data values
  +     * @param sample2 array of sample data values
  +     * @param alpha significance level of the test
  +     * @return true if the null hypothesis can be rejected with 
  +     * confidence 1 - alpha
  +     * @throws IllegalArgumentException if the preconditions are not met
  +     * @throws MathException if an error occurs performing the test
  +     */
  +    public boolean tTest(double[] sample1, double[] sample2,
  +            double alpha)
  +    throws IllegalArgumentException, MathException {
  +        if ((alpha <= 0) || (alpha > 0.5)) {
  +            throw new IllegalArgumentException("bad significance level: " + alpha);
  +        }
  +        return (tTest(sample1, sample2) < alpha);
  +    }
  +    
  +    /**
  +     * Performs a 
  +     * <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
  +     * two-sided t-test</a> evaluating the null hypothesis that <code>sample1</code> 
  +     * and <code>sample2</code> are drawn from populations with the same mean, 
  +     * with significance level <code>alpha</code>,  assuming that the
  +     * subpopulation variances are equal.  Use 
  +     * {@link #tTest(double[], double[], double)} to perform the test without
  +     * the assumption of equal variances.
        * <p>
  -     * If <code>equalVariances</code> is <code>true</code>, a pooled variance
  -     * estimate is used to compute the t-statistic (formula (2)) and the sum of the 
  -     * sample sizes minus 2 is used as the degrees of freedom.
  +     * Returns <code>true</code> iff the null hypothesis that the means are
  +     * equal can be rejected with confidence <code>1 - alpha</code>.  To 
  +     * perform a 1-sided test, use <code>alpha / 2.</code>  To perform the test
  +     * without the assumption of equal subpopulation variances, use 
  +     * {@link #tTest(double[], double[], double)}.
  +     * <p>
  +     * A pooled variance estimate is used to compute the t-statistic. See
  +     * {@link #t(double[], double[])} for the formula. The sum of the sample
  +     * sizes minus 2 is used as the degrees of freedom.
        * <p>
        * <strong>Examples:</strong><br><ol>
        * <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
  -     * the 95% level, under the assumption of equal subpopulation variances, 
  -     * use <br><code>tTest(sample1, sample2, 0.05, true) </code>
  +     * the 95% level, use <br><code>tTest(sample1, sample2, 0.05). </code>
        * </li>
  -     * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>
  -     * at the 99% level without assuming equal variances, first verify that the measured 
  -     * mean of <code>sample 1</code> is less than the mean of <code>sample 2</code>
  -     * and then use <br><code>tTest(sample1, sample2, 0.005, false) </code>
  +     * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2, </code>
  +     * at the 99% level, first verify that the measured mean of 
  +     * <code>sample 1</code> is less than the mean of <code>sample 2</code>
  +     * and then use
  +     * <br><code>tTest(sample1, sample2, 0.005) </code>
        * </li></ol>
        * <p>
        * <strong>Usage Note:</strong><br>
  @@ -559,45 +737,81 @@
        * @param sample1 array of sample data values
        * @param sample2 array of sample data values
        * @param alpha significance level of the test
  -     * @param equalVariances are sample variances assumed to be equal?
        * @return true if the null hypothesis can be rejected with 
        * confidence 1 - alpha
        * @throws IllegalArgumentException if the preconditions are not met
        * @throws MathException if an error occurs performing the test
        */
  -    public boolean tTest(double[] sample1, double[] sample2, double alpha, 
  -            boolean equalVariances)
  +    public boolean homoscedasticTTest(double[] sample1, double[] sample2,
  +            double alpha)
       throws IllegalArgumentException, MathException {
           if ((alpha <= 0) || (alpha > 0.5)) {
               throw new IllegalArgumentException("bad significance level: " + alpha);
           }
  -        return (tTest(sample1, sample2, equalVariances) < alpha);
  +        return (homoscedasticTTest(sample1, sample2) < alpha);
       }
   
        /**
        * Returns the <i>observed significance level</i>, or 
  -     * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
  -     * p-value</a>, associated with a two-sample, two-tailed t-test 
  -     * comparing the means of the datasets described by two Univariates.
  +     * <i>p-value</i>, associated with a two-sample, two-tailed t-test 
  +     * comparing the means of the datasets described by two StatisticalSummary
  +     * instances.
        * <p>
        * The number returned is the smallest significance level
        * at which one can reject the null hypothesis that the two means are
        * equal in favor of the two-sided alternative that they are different. 
        * For a one-sided test, divide the returned value by 2.
        * <p>
  -     * If the <code>equalVariances</code> parameter is <code>false,</code>
  -     * the test does not assume that the underlying popuation variances are
  +     * The test does not assume that the underlying popuation variances are
        * equal  and it uses approximated degrees of freedom computed from the 
  -     * sample data to compute the p-value.  In this case, formula (1) for the
  -     * {@link #t(double[], double[], boolean)} statistic is used
  -     * and the Welch-Satterthwaite approximation to the degrees of freedom is used, 
  -     * as described 
  -     * <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
  -     * here.</a>
  +     * sample data to compute the p-value.   To perform the test assuming
  +     * equal variances, use 
  +     * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
  +     * <p>
  +     * <strong>Usage Note:</strong><br>
  +     * The validity of the p-value depends on the assumptions of the parametric
  +     * t-test procedure, as discussed 
  +     * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
  +     * here</a>
  +     * <p>
  +     * <strong>Preconditions</strong>: <ul>
  +     * <li>The datasets described by the two Univariates must each contain
  +     * at least 2 observations.
  +     * </li></ul>
  +     *
  +     * @param sampleStats1  StatisticalSummary describing data from the first sample
  +     * @param sampleStats2  StatisticalSummary describing data from the second sample
  +     * @return p-value for t-test
  +     * @throws IllegalArgumentException if the precondition is not met
  +     * @throws MathException if an error occurs computing the p-value
  +     */
  +    public double tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2)
  +    throws IllegalArgumentException, MathException {
  +        if ((sampleStats1 == null) || (sampleStats2 == null ||
  +                Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
  +            throw new IllegalArgumentException("insufficient data for t statistic");
  +        }
  +        return tTest(sampleStats1.getMean(), sampleStats2.getMean(), sampleStats1.getVariance(),
  +                sampleStats2.getVariance(), (double) sampleStats1.getN(), 
  +                (double) sampleStats2.getN());
  +    }
  +    
  +    /**
  +     * Returns the <i>observed significance level</i>, or 
  +     * <i>p-value</i>, associated with a two-sample, two-tailed t-test 
  +     * comparing the means of the datasets described by two StatisticalSummary
  +     * instances, under the hypothesis of equal subpopulation variances. To
  +     * perform a test without the equal variances assumption, use
  +     * {@link #tTest(StatisticalSummary, StatisticalSummary)}.
  +     * <p>
  +     * The number returned is the smallest significance level
  +     * at which one can reject the null hypothesis that the two means are
  +     * equal in favor of the two-sided alternative that they are different. 
  +     * For a one-sided test, divide the returned value by 2.
        * <p>
  -     * If <code>equalVariances</code> is <code>true</code>, a pooled variance
  -     * estimate is used to compute the t-statistic (formula (2)) and the sum of the 
  -     * sample sizes minus 2 is used as the degrees of freedom.
  +     * See {@link #homoscedasticT(double[], double[])} for the formula used to
  +     * compute the t-statistic. The sum of the  sample sizes minus 2 is used as
  +     * the degrees of freedom.
        * <p>
        * <strong>Usage Note:</strong><br>
        * The validity of the p-value depends on the assumptions of the parametric
  @@ -611,57 +825,53 @@
        *
        * @param sampleStats1  StatisticalSummary describing data from the first sample
        * @param sampleStats2  StatisticalSummary describing data from the second sample
  -     * @param equalVariances  are sample variances assumed to be equal?
        * @return p-value for t-test
        * @throws IllegalArgumentException if the precondition is not met
        * @throws MathException if an error occurs computing the p-value
        */
  -    public double tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2, 
  -            boolean equalVariances)
  +    public double homoscedasticTTest(StatisticalSummary sampleStats1, 
  +            StatisticalSummary sampleStats2)
       throws IllegalArgumentException, MathException {
           if ((sampleStats1 == null) || (sampleStats2 == null ||
                   Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
               throw new IllegalArgumentException("insufficient data for t statistic");
           }
  -        return tTest(sampleStats1.getMean(), sampleStats2.getMean(), sampleStats1.getVariance(),
  +        return homoscedasticTTest(sampleStats1.getMean(),
  +                sampleStats2.getMean(), sampleStats1.getVariance(),
                   sampleStats2.getVariance(), (double) sampleStats1.getN(), 
  -                (double) sampleStats2.getN(), equalVariances);
  +                (double) sampleStats2.getN());
       }
   
       /**
  -     * Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
  -     * two-sided t-test</a> evaluating the null hypothesis that <code>sampleStats1</code>
  -     * and <code>sampleStats2</code> describe datasets drawn from populations with the 
  -     * same mean, with significance level <code>alpha</code>.
  +     * Performs a 
  +     * <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
  +     * two-sided t-test</a> evaluating the null hypothesis that 
  +     * <code>sampleStats1</code> and <code>sampleStats2</code> describe
  +     * datasets drawn from populations with the same mean, with significance
  +     * level <code>alpha</code>.   This test does not assume that the
  +     * subpopulation variances are equal.  To perform the test under the equal
  +     * variances assumption, use
  +     * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
        * <p>
        * Returns <code>true</code> iff the null hypothesis that the means are
        * equal can be rejected with confidence <code>1 - alpha</code>.  To 
        * perform a 1-sided test, use <code>alpha / 2</code>
        * <p>
  -     * If the <code>equalVariances</code> parameter is <code>false,</code>
  -     * the test does not assume that the underlying popuation variances are
  -     * equal  and it uses approximated degrees of freedom computed from the 
  -     * sample data to compute the p-value.  In this case, formula (1) for the
  -     * {@link #t(double[], double[], boolean)} statistic is used
  -     * and the Welch-Satterthwaite approximation to the degrees of freedom is used, 
  -     * as described 
  +     * See {@link #t(double[], double[])} for the formula used to compute the
  +     * t-statistic.  Degrees of freedom are approximated using the
        * <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
  -     * here.</a>
  -     * <p>
  -     * If <code>equalVariances</code> is <code>true</code>, a pooled variance
  -     * estimate is used to compute the t-statistic (formula (2)) and the sum of the 
  -     * sample sizes minus 2 is used as the degrees of freedom.
  +     * Welch-Satterthwaite approximation.</a>
        * <p>
        * <strong>Examples:</strong><br><ol>
        * <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
  -     * the 95% level under the assumption of equal subpopulation variances, use 
  -     * <br><code>tTest(sampleStats1, sampleStats2, 0.05, true) </code>
  +     * the 95%, use 
  +     * <br><code>tTest(sampleStats1, sampleStats2, 0.05) </code>
        * </li>
        * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>
  -     * at the 99% level without assuming that subpopulation variances are equal, 
  -     * first verify that the measured mean of  <code>sample 1</code> is less than 
  -     * the mean of <code>sample 2</code> and then use 
  -     * <br><code>tTest(sampleStats1, sampleStats2, 0.005, false) </code>
  +     * at the 99% level,  first verify that the measured mean of  
  +     * <code>sample 1</code> is less than  the mean of <code>sample 2</code>
  +     * and then use 
  +     * <br><code>tTest(sampleStats1, sampleStats2, 0.005) </code>
        * </li></ol>
        * <p>
        * <strong>Usage Note:</strong><br>
  @@ -680,19 +890,18 @@
        * @param sampleStats1 StatisticalSummary describing sample data values
        * @param sampleStats2 StatisticalSummary describing sample data values
        * @param alpha significance level of the test
  -     * @param equalVariances  are sample variances assumed to be equal?
        * @return true if the null hypothesis can be rejected with 
        * confidence 1 - alpha
        * @throws IllegalArgumentException if the preconditions are not met
        * @throws MathException if an error occurs performing the test
        */
  -    public boolean tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
  -            double alpha, boolean equalVariances)
  +    public boolean tTest(StatisticalSummary sampleStats1,
  +            StatisticalSummary sampleStats2, double alpha)
       throws IllegalArgumentException, MathException {
           if ((alpha <= 0) || (alpha > 0.5)) {
               throw new IllegalArgumentException("bad significance level: " + alpha);
           }
  -        return (tTest(sampleStats1, sampleStats2, equalVariances) < alpha);
  +        return (tTest(sampleStats1, sampleStats2) < alpha);
       }
       
       //----------------------------------------------- Protected methods 
  @@ -738,8 +947,8 @@
       
       /**
        * Computes t test statistic for 2-sample t-test.
  -     * If equalVariance is true,  the pooled variance
  -     * estimate is computed and used.
  +     * <p>
  +     * Does not assume that subpopulation variances are equal.
        * 
        * @param m1 first sample mean
        * @param m2 second sample mean
  @@ -747,17 +956,29 @@
        * @param v2 second sample variance
        * @param n1 first sample n
        * @param n2 second sample n
  -     * @param equalVariances  are variances assumed equal?
        * @return t test statistic
        */
       protected double t(double m1, double m2,  double v1, double v2, double n1,
  -            double n2, boolean equalVariances)  {
  -        if (equalVariances) {
  -           double pooledVariance = ((n1  - 1) * v1 + (n2 -1) * v2 ) / (n1 + n2 - 2); 
  -           return (m1 - m2) / Math.sqrt(pooledVariance * (1d / n1 + 1d / n2));
  -        } else {
  +            double n2)  {
               return (m1 - m2) / Math.sqrt((v1 / n1) + (v2 / n2));
  -        }
  +    }
  +    
  +    /**
  +     * Computes t test statistic for 2-sample t-test under the hypothesis
  +     * of equal subpopulation variances.
  +     * 
  +     * @param m1 first sample mean
  +     * @param m2 second sample mean
  +     * @param v1 first sample variance
  +     * @param v2 second sample variance
  +     * @param n1 first sample n
  +     * @param n2 second sample n
  +     * @return t test statistic
  +     */
  +    protected double homoscedasticT(double m1, double m2,  double v1,
  +            double v2, double n1, double n2)  {
  +            double pooledVariance = ((n1  - 1) * v1 + (n2 -1) * v2 ) / (n1 + n2 - 2); 
  +            return (m1 - m2) / Math.sqrt(pooledVariance * (1d / n1 + 1d / n2));
       }
       
       /**
  @@ -780,8 +1001,9 @@
   
       /**
        * Computes p-value for 2-sided, 2-sample t-test.
  -     * If equalVariances is true, the sum of the sample sizes minus 2
  -     * is used as df; otherwise df is approximated from the data.
  +     * <p>
  +     * Does not assume subpopulation variances are equal. Degrees of freedom
  +     * are estimated from the data.
        * 
        * @param m1 first sample mean
        * @param m2 second sample mean
  @@ -789,20 +1011,41 @@
        * @param v2 second sample variance
        * @param n1 first sample n
        * @param n2 second sample n
  -     * @param equalVariances  are variances assumed equal?
        * @return p-value
        * @throws MathException if an error occurs computing the p-value
        */
       protected double tTest(double m1, double m2, double v1, double v2, 
  -            double n1, double n2, boolean equalVariances)
  +            double n1, double n2)
  +    throws MathException {
  +        double t = Math.abs(t(m1, m2, v1, v2, n1, n2));
  +        double degreesOfFreedom = 0;
  +        degreesOfFreedom= df(v1, v2, n1, n2);
  +        TDistribution tDistribution =
  +            getDistributionFactory().createTDistribution(degreesOfFreedom);
  +        return 1.0 - tDistribution.cumulativeProbability(-t, t);
  +    }
  +    
  +    /**
  +     * Computes p-value for 2-sided, 2-sample t-test, under the assumption
  +     * of equal subpopulation variances.
  +     * <p>
  +     * The sum of the sample sizes minus 2 is used as degrees of freedom.
  +     * 
  +     * @param m1 first sample mean
  +     * @param m2 second sample mean
  +     * @param v1 first sample variance
  +     * @param v2 second sample variance
  +     * @param n1 first sample n
  +     * @param n2 second sample n
  +     * @return p-value
  +     * @throws MathException if an error occurs computing the p-value
  +     */
  +    protected double homoscedasticTTest(double m1, double m2, double v1,
  +            double v2, double n1, double n2)
       throws MathException {
  -        double t = Math.abs(t(m1, m2, v1, v2, n1, n2, equalVariances));
  +        double t = Math.abs(t(m1, m2, v1, v2, n1, n2));
           double degreesOfFreedom = 0;
  -        if (equalVariances) {
               degreesOfFreedom = (double) (n1 + n2 - 2);
  -        } else {
  -            degreesOfFreedom= df(v1, v2, n1, n2);
  -        }
           TDistribution tDistribution =
               getDistributionFactory().createTDistribution(degreesOfFreedom);
           return 1.0 - tDistribution.cumulativeProbability(-t, t);
  
  
  
  1.6       +24 -24    jakarta-commons/math/src/test/org/apache/commons/math/stat/inference/TTestTest.java
  
  Index: TTestTest.java
  ===================================================================
  RCS file: /home/cvs/jakarta-commons/math/src/test/org/apache/commons/math/stat/inference/TTestTest.java,v
  retrieving revision 1.5
  retrieving revision 1.6
  diff -u -r1.5 -r1.6
  --- TTestTest.java	2 Jun 2004 13:08:55 -0000	1.5
  +++ TTestTest.java	2 Aug 2004 04:20:09 -0000	1.6
  @@ -166,73 +166,73 @@
            
           // Target comparison values computed using R version 1.8.1 (Linux version)
           assertEquals("two sample heteroscedastic t stat", 1.603717, 
  -                testStatistic.t(sample1, sample2, false), 1E-6);
  +                testStatistic.t(sample1, sample2), 1E-6);
           assertEquals("two sample heteroscedastic t stat", 1.603717, 
  -                testStatistic.t(sampleStats1, sampleStats2, false), 1E-6);
  +                testStatistic.t(sampleStats1, sampleStats2), 1E-6);
           assertEquals("two sample heteroscedastic p value", 0.1288394, 
  -                testStatistic.tTest(sample1, sample2, false), 1E-7);
  +                testStatistic.tTest(sample1, sample2), 1E-7);
           assertEquals("two sample heteroscedastic p value", 0.1288394, 
  -                testStatistic.tTest(sampleStats1, sampleStats2, false), 1E-7);     
  +                testStatistic.tTest(sampleStats1, sampleStats2), 1E-7);     
           assertTrue("two sample heteroscedastic t-test reject", 
  -                testStatistic.tTest(sample1, sample2, 0.2, false));
  +                testStatistic.tTest(sample1, sample2, 0.2));
           assertTrue("two sample heteroscedastic t-test reject", 
  -                testStatistic.tTest(sampleStats1, sampleStats2, 0.2, false));
  +                testStatistic.tTest(sampleStats1, sampleStats2, 0.2));
           assertTrue("two sample heteroscedastic t-test accept", 
  -                !testStatistic.tTest(sample1, sample2, 0.1, false));
  +                !testStatistic.tTest(sample1, sample2, 0.1));
           assertTrue("two sample heteroscedastic t-test accept", 
  -                !testStatistic.tTest(sampleStats1, sampleStats2, 0.1, false));
  +                !testStatistic.tTest(sampleStats1, sampleStats2, 0.1));
        
           try {
  -            testStatistic.tTest(sample1, sample2, .95, false);
  +            testStatistic.tTest(sample1, sample2, .95);
               fail("alpha out of range, IllegalArgumentException expected");
           } catch (IllegalArgumentException ex) {
  -            // exptected
  +            // expected
           } 
           
           try {
  -            testStatistic.tTest(sampleStats1, sampleStats2, .95, false);
  +            testStatistic.tTest(sampleStats1, sampleStats2, .95);
               fail("alpha out of range, IllegalArgumentException expected");
           } catch (IllegalArgumentException ex) {
               // expected 
           }  
           
           try {
  -            testStatistic.tTest(sample1, tooShortObs, .01, false);
  +            testStatistic.tTest(sample1, tooShortObs, .01);
               fail("insufficient data, IllegalArgumentException expected");
           } catch (IllegalArgumentException ex) {
               // expected
           }  
           
           try {
  -            testStatistic.tTest(sampleStats1, tooShortStats, .01, false);
  +            testStatistic.tTest(sampleStats1, tooShortStats, .01);
               fail("insufficient data, IllegalArgumentException expected");
           } catch (IllegalArgumentException ex) {
               // expected
           }  
           
           try {
  -            testStatistic.tTest(sample1, tooShortObs, false);
  +            testStatistic.tTest(sample1, tooShortObs);
               fail("insufficient data, IllegalArgumentException expected");
           } catch (IllegalArgumentException ex) {
              // expected
           }  
           
           try {
  -            testStatistic.tTest(sampleStats1, tooShortStats, false);
  +            testStatistic.tTest(sampleStats1, tooShortStats);
               fail("insufficient data, IllegalArgumentException expected");
           } catch (IllegalArgumentException ex) {
               // expected
           }  
           
           try {
  -            testStatistic.t(sample1, tooShortObs, false);
  +            testStatistic.t(sample1, tooShortObs);
               fail("insufficient data, IllegalArgumentException expected");
           } catch (IllegalArgumentException ex) {
               // expected
           }
           
           try {
  -            testStatistic.t(sampleStats1, tooShortStats, false);
  +            testStatistic.t(sampleStats1, tooShortStats);
               fail("insufficient data, IllegalArgumentException expected");
           } catch (IllegalArgumentException ex) {
              // expected
  @@ -252,13 +252,13 @@
           
           // Target comparison values computed using R version 1.8.1 (Linux version)
          assertEquals("two sample homoscedastic t stat", -1.120897, 
  -              testStatistic.t(sample1, sample2, true), 10E-6);
  +              testStatistic.homoscedasticT(sample1, sample2), 10E-6);
           assertEquals("two sample homoscedastic p value", 0.2948490, 
  -                testStatistic.tTest(sampleStats1, sampleStats2, true), 1E-6);     
  +                testStatistic.homoscedasticTTest(sampleStats1, sampleStats2), 1E-6);     
           assertTrue("two sample homoscedastic t-test reject", 
  -                testStatistic.tTest(sample1, sample2, 0.3, true));
  +                testStatistic.homoscedasticTTest(sample1, sample2, 0.3));
           assertTrue("two sample homoscedastic t-test accept", 
  -                !testStatistic.tTest(sample1, sample2, 0.2, true));
  +                !testStatistic.homoscedasticTTest(sample1, sample2, 0.2));
       }
       
       public void testSmallSamples() throws Exception {
  @@ -266,8 +266,8 @@
           double[] sample2 = {4d, 5d};        
           
           // Target values computed using R, version 1.8.1 (linux version)
  -        assertEquals(-2.2361, testStatistic.t(sample1, sample2, false), 1E-4);
  -        assertEquals(0.1987, testStatistic.tTest(sample1, sample2, false), 1E-4);
  +        assertEquals(-2.2361, testStatistic.t(sample1, sample2), 1E-4);
  +        assertEquals(0.1987, testStatistic.tTest(sample1, sample2), 1E-4);
       }
       
       public void testPaired() throws Exception {
  
  
  
  1.20      +11 -9     jakarta-commons/math/xdocs/userguide/stat.xml
  
  Index: stat.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-commons/math/xdocs/userguide/stat.xml,v
  retrieving revision 1.19
  retrieving revision 1.20
  diff -u -r1.19 -r1.20
  --- stat.xml	23 Jun 2004 16:26:16 -0000	1.19
  +++ stat.xml	2 Aug 2004 04:20:09 -0000	1.20
  @@ -411,7 +411,10 @@
             Welch-Satterwaite approximation</a> is used to compute the degrees 
             of freedom.  Methods to return t-statistics and p-values are provided in each 
             case, as well as boolean-valued methods to perform fixed significance
  -          level tests. See the examples below and the API documentation for 
  +          level tests.  The names of methods or methods that assume equal 
  +          subpopulation variances always start with "homoscedastic."  Test or 
  +          test-statistic methods that just start with "t" do not assume equal
  +          variances. See the examples below and the API documentation for 
             more details.</li>
             <li>The validity of the p-values returned by the t-test depends on the 
             assumptions of the parametric t-test procedure, as discussed 
  @@ -536,26 +539,25 @@
             To compute the t-statistic:
             <source>
   TTestImpl testStatistic = new TTestImpl();
  -testStatistic.t(summary1, summary2, false);  
  +testStatistic.t(summary1, summary2);  
             </source>
              </p>
              <p>
              To compute the (one-sided) p-value:
              <source>
  -testStatistic.tTest(sample1, sample2, false);
  +testStatistic.tTest(sample1, sample2);
              </source> 
              </p>
              <p>
              To perform a fixed significance level test with alpha = .05:
              <source>
  -testStatistic.tTest(sample1, sample2, .05, false);    
  +testStatistic.tTest(sample1, sample2, .05);    
              </source>
              </p> 
              <p>
  -           In each case above, the last (boolean) parameter determines
  -           whether or not the test should assume that subpopulation variances
  -           are equal.  Replacing this with <code>true</code> will result in 
  -           homoscedastic (equal variances) tests / test statistics.
  +           In each case above, the test does not assume that the subpopulation
  +           variances are equal.  To perform the tests under this assumption,
  +           replace "t" at the beginning of the method name with "homoscedasticT"
              </p>   
              </dd>     
             <dt>Computing <code>chi-square</code> test statistics</dt>
  
  
  

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message