commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "carl anderson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MATH-175) chiSquare(double[] expected, long[] observed) is returning incorrect test statistic
Date Wed, 05 Dec 2007 17:45:43 GMT

    [ https://issues.apache.org/jira/browse/MATH-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548743
] 

carl anderson commented on MATH-175:
------------------------------------

Hi Phil,

I coded a rescaling, as below, but I have to admit that I spent a long
time puzzling over why results from Java differed from those with R
because neither threw an exception or any warning that the argument sums
differed. It just didn't occur to me at first that this was an issue.

Carl


package com.archimedesmodel.automation.stats;

import org.apache.commons.math.stat.inference.ChiSquareTestImpl;

public class ArchiChiSquared extends ChiSquareTestImpl {

	public double chiSquare(double[] expected, long[] observed)
			throws IllegalArgumentException {
		double sumSq = 0.0d;
		double dev = 0.0d;
		if ((expected.length < 2) || (expected.length !=
observed.length)) {
			throw new IllegalArgumentException(
					"observed, expected array
lengths incorrect");
		}

		double sumObs = 0;
		for (int i = 0; i < observed.length; i++) {
			sumObs += observed[i];
			if (observed[i] < 0) {
				throw new IllegalArgumentException(
						"observed counts must be
non-negative");
			}
		}

		double sumExp = 0;
		for (int i = 0; i < expected.length; i++) {
			sumExp += expected[i];
			if (expected[i] <= 0) {
				throw new IllegalArgumentException(
						"expected counts must be
postive");
			}
		}

		double ratio = 1.0;
		if (Double.compare(sumObs, sumExp) != 0) {
			//log some warning?
			ratio = sumObs / sumExp;
		}

		for (int i = 0; i < observed.length; i++) {
			dev = ((double) observed[i] - ratio *
expected[i]);
			sumSq += dev * dev / (ratio * expected[i]);
		}
		return sumSq;
	}

}









> chiSquare(double[] expected, long[] observed) is returning incorrect test statistic
> -----------------------------------------------------------------------------------
>
>                 Key: MATH-175
>                 URL: https://issues.apache.org/jira/browse/MATH-175
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 1.1
>         Environment: windows xp
>            Reporter: carl anderson
>         Attachments: chi.xls
>
>
> ChiSquareTestImpl is returning incorrect chi-squared value. An implicit assumption of
public double chiSquare(double[] expected, long[] observed) is that the sum of expected and
observed are equal. That is, in the code:
> for (int i = 0; i < observed.length; i++) {
>             dev = ((double) observed[i] - expected[i]);
>             sumSq += dev * dev / expected[i];
>         }
> this calculation is only correct if sum(observed)==sum(expected). When they are not equal
then one must rescale the expected value by sum(observed) / sum(expected) so that they are.
> Ironically, it is an example in the unit test ChiSquareTestTest that highlights the error:
> long[] observed1 = { 500, 623, 72, 70, 31 };
>         double[] expected1 = { 485, 541, 82, 61, 37 };
>         assertEquals( "chi-square test statistic", 16.4131070362, testStatistic.chiSquare(expected1,
observed1), 1E-10);
>         assertEquals("chi-square p-value", 0.002512096, testStatistic.chiSquareTest(expected1,
observed1), 1E-9);
> 16.413 is not correct because the expected values do not make sense, they should be:
521.19403 581.37313  88.11940  65.55224  39.76119 so that the sum of expected equals 1296
which is the sum of observed.
> Here is some R code (r-project.org) which proves it:
> > o1
> [1] 500 623  72  70  31
> > e1
> [1] 485 541  82  61  37
> > chisq.test(o1,p=e1,rescale.p=TRUE)
>         Chi-squared test for given probabilities
> data:  o1 
> X-squared = 9.0233, df = 4, p-value = 0.06052
> > chisq.test(o1,p=e1,rescale.p=TRUE)$observed
> [1] 500 623  72  70  31
> > chisq.test(o1,p=e1,rescale.p=TRUE)$expected
> [1] 521.19403 581.37313  88.11940  65.55224  39.76119
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message