Return-Path: Delivered-To: apmail-commons-user-archive@www.apache.org Received: (qmail 73462 invoked from network); 25 Sep 2007 01:05:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 25 Sep 2007 01:05:19 -0000 Received: (qmail 61416 invoked by uid 500); 25 Sep 2007 01:05:07 -0000 Delivered-To: apmail-commons-user-archive@commons.apache.org Received: (qmail 61363 invoked by uid 500); 25 Sep 2007 01:05:07 -0000 Mailing-List: contact user-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Jakarta Commons Users List" Delivered-To: mailing list user@commons.apache.org Delivered-To: moderator for user@commons.apache.org Received: (qmail 63447 invoked by uid 99); 24 Sep 2007 20:46:16 -0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Message-ID: <46F8223D.7040807@weru.ksu.edu> Date: Mon, 24 Sep 2007 15:46:53 -0500 From: Bill Rust Organization: USDA-ARS Wind Erosion Research Unit User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: Phil Steitz CC: Jakarta Commons Users List Subject: Re: normal deviates don't pass t test References: <46EEF371.3060106@weru.ksu.edu> <8a81b4af0709220023k251ed5fdpd337fdca2de0d3cf@mail.gmail.com> In-Reply-To: <8a81b4af0709220023k251ed5fdpd337fdca2de0d3cf@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/4381/Mon Sep 24 12:20:51 2007 on virusfilter2.cc.ksu.edu X-Virus-Scanned: clamd / ClamAV version 0.71, clamav-milter version 0.71 X-Virus-Status: Clean X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on zingg.weru.ksu.edu X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, hits=0.0 required=5.0 tests=none autolearn=no version=2.63 Thanks for the reply. First, the reason for using NormalDistributionImpl is that I'm translating from FORTRAN and I wasn't thinking a whole lot. Using Random makes sense. Second, I'm not really caring, at least not yet, about how normal my output samples are. What I do really care about is that the means of my generated samples match the observed means. For example, if I start with an observed max temperature of 30, I want to determine that 30 is within the confidence interval of the sample at a 90% level. What I think that I am getting is, using the same technique, half the time I am hitting my goal and the other half my samples stink. If my understanding is correct, I should be getting a 90% confidence level 9 times out of 10, more or less, which clearly isn't happening. wjr Phil Steitz wrote: > On 9/17/07, William J Rust wrote: >> I'm working on a climate simulation program that takes monthly averages >> and generates daily readings that are assumed to be normally >> distributed. The following program creates 10 sets of 100,000 random >> deviates with mean 10 and SD 5. It then applies a t test (results below) >> to ensure that the generated numbers are good enough. As the results >> show, they aren't. I'm wondering a) I am doing something wrong or b) is >> there something wrong with the stats routines? > > There are a couple of problems here. First, while your inversion > method should generate approximately normally distributed values, it > is better to use the JDK-supplied method for this (much faster and a > better algorithm). There is a wrapped version of this provided in > org.apache.commons.math.random.RandomDataImpl. To use that: > > import org.apache.commons.math.random.RandomData; > import org.apache.commons.math.random.RandomDataImpl; > RandomData randomData = new RandomDataImpl(); > ... > arry[idx] = randomData.nextGaussian(10, 5); > > Second, I don't understand what you are expecting from the t-test. > TestUtils.tTest(mu, array) returns the p-value associated with a > two-tailed test with the null hypothesis that the values in the array > come from a distribution with mean = mu. So small p-values, say less > than .01, would indicate that the mean appears to differ significantly > from 10. This should happen roughly one in every 100 times. > Differences as large as what you observed on your first run should > happen about 34 out of every 100 times, etc. The values reported > below do not look surprising to me. They do not support rejecting the > null hypothesis that the mean is what it is supposed to be, which is a > good thing. > > To test normality of the deviates, you should apply a normality test > to the deviates themselves, e.g. a Kolmogorov-Smirnov test. Commons > math does not currently include normality tests (patches welcome :). > To do this, you would need to dump the generated arrays to a file and > then do the test with R or some other package that includes normality > tests. > > Unless I am missing something, I don't think a t-test is going to give > you the information that you need to verify that the generated values > are normally distributed. Another thing that you could do is to > examine the empirical distribution of the generated values - lay a > grid over the range and count how many fall into each range and > compare these counts to what you would expect under the hypothesis of > normality (essentially what the K-S test does). You can use > org.apache.commons.random.EmpircalDistribution to bin the generated > data and get bin counts. > > If you do find that normality tests fail on the generated values using > either your inversion method or the RandomDataImpl.nextGaussian > method, please open a Jira ticket > (http://commons.apache.org/math/issue-tracking.html) including the R > script or output from the package that you used for testing. Thanks! > > hth, > > Phil > > >> Thanks, >> >> wjr >> >> package usda.weru.cligen2; >> >> import org.apache.commons.math.MathException; >> >> /** >> * >> * @author wjr >> */ >> public class TestNormal { >> >> static org.apache.commons.math.distribution.NormalDistributionImpl nd = >> new >> org.apache.commons.math.distribution.NormalDistributionImpl(10, 5); >> >> public static void main(String[] args) { >> double[] arry = new double[100000]; >> java.util.Random ran = new java.util.Random(1l); >> >> for (int jdx = 0; jdx < 10; jdx++) { >> for (int idx = 0; idx < arry.length; idx++) { >> try { >> arry[idx] = >> nd.inverseCumulativeProbability(ran.nextDouble()); >> } catch (MathException ex) { >> ex.printStackTrace(); >> } >> } >> try { >> System.out.println("ttest " + >> org.apache.commons.math.stat.inference.TestUtils.tTest(10,arry)); >> } catch (IllegalArgumentException ex) { >> ex.printStackTrace(); >> } catch (MathException ex) { >> ex.printStackTrace(); >> } >> } >> } >> } >> >> Output: >> >>> run-single: >>> ttest 0.3433300114960922 >>> ttest 0.1431930575825282 >>> ttest 0.12336027805916228 >>> ttest 0.49478850669361796 >>> ttest 0.9216887341410063 >>> ttest 0.9937228334312525 >>> ttest 0.13669784550400177 >>> ttest 0.9646134537758599 >>> ttest 0.9965741269090211 >>> ttest 0.03815948891784959 >>> BUILD SUCCESSFUL (total time: 20 seconds) >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org >> For additional commands, e-mail: user-help@commons.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@commons.apache.org For additional commands, e-mail: user-help@commons.apache.org