Mailing-List: contact user-help@commons.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Jakarta Commons Users List" <user@commons.apache.org>
Received-SPF: pass (nike.apache.org: local policy)
Message-ID: <46F8223D.7040807@weru.ksu.edu>
Date: Mon, 24 Sep 2007 15:46:53 -0500
From: Bill Rust <wjr@weru.ksu.edu>
Organization: USDA-ARS Wind Erosion Research Unit
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
MIME-Version: 1.0
To: Phil Steitz <phil.steitz@gmail.com>
CC: Jakarta Commons Users List <user@commons.apache.org>
Subject: Re: normal deviates don't pass t test
References: <46EEF371.3060106@weru.ksu.edu>
 <8a81b4af0709220023k251ed5fdpd337fdca2de0d3cf@mail.gmail.com>
In-Reply-To: <8a81b4af0709220023k251ed5fdpd337fdca2de0d3cf@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Thanks for the reply.

First, the reason for using NormalDistributionImpl is that I'm 
translating from FORTRAN and I wasn't thinking a whole lot. Using Random 
makes sense.

Second, I'm not really caring, at least not yet, about how normal my 
output samples are. What I do really care about is that the means of my 
generated samples match the observed means. For example, if I start with 
an observed max temperature of 30, I want to determine that 30 is within 
the confidence interval of the sample at a 90% level. What I think that 
I am getting is, using the same technique, half the time I am hitting my 
goal and the other half my samples stink. If my understanding is 
correct, I should be getting a 90% confidence level 9 times out of 10, 
more or less, which clearly isn't happening.

wjr

Phil Steitz wrote:
> On 9/17/07, William J Rust <wjr@weru.ksu.edu> wrote:
>> I'm working on a climate simulation program that takes monthly averages
>> and generates daily readings that are assumed to be normally
>> distributed. The following program creates 10 sets of 100,000 random
>> deviates with mean 10 and SD 5. It then applies a t test (results below)
>> to ensure that the generated numbers are good enough. As the results
>> show, they aren't. I'm wondering a) I am doing something wrong or b) is
>> there something wrong with the stats routines?
> 
> There are a couple of problems here.  First, while your inversion
> method should generate approximately normally distributed values, it
> is better to use the JDK-supplied method for this (much faster and a
> better algorithm).  There is a wrapped version of this provided in
> org.apache.commons.math.random.RandomDataImpl. To use that:
> 
> import org.apache.commons.math.random.RandomData;
> import org.apache.commons.math.random.RandomDataImpl;
> RandomData randomData = new RandomDataImpl();
> ...
> arry[idx] = randomData.nextGaussian(10, 5);
> 
> Second, I don't understand what you are expecting from the t-test.
> TestUtils.tTest(mu, array) returns the p-value associated with a
> two-tailed test with the null hypothesis that the values in the array
> come from a distribution with mean = mu.  So small p-values, say less
> than .01, would indicate that the mean appears to differ significantly
> from 10. This should happen roughly one in every 100 times.
> Differences as large as what you observed on your first run should
> happen about 34 out of every 100 times, etc.  The values reported
> below do not look surprising to me. They do not support rejecting the
> null hypothesis that the mean is what it is supposed to be, which is a
> good thing.
> 
> To test normality of the deviates, you should apply a normality test
> to the deviates themselves, e.g. a Kolmogorov-Smirnov test.  Commons
> math does not currently include normality tests  (patches welcome :).
> To do this, you would need to dump the generated arrays to a file and
> then do the test with R or some other package that includes normality
> tests.
> 
> Unless I am missing something, I don't think a t-test is going to give
> you the information that you need to verify that the generated values
> are normally distributed.  Another thing that you could do is to
> examine the empirical distribution of the generated values - lay a
> grid over the range and count how many fall into each range and
> compare these counts to what you would expect under the hypothesis of
> normality (essentially what the K-S test does).  You can use
> org.apache.commons.random.EmpircalDistribution to bin the generated
> data and get bin counts.
> 
> If you do find that normality tests fail on the generated values using
> either your inversion method or the RandomDataImpl.nextGaussian
> method, please open a Jira ticket
> (http://commons.apache.org/math/issue-tracking.html) including the R
> script or output from the package that you used for testing.  Thanks!
> 
> hth,
> 
> Phil
> 
> 
>> Thanks,
>>
>> wjr
>>
>> package usda.weru.cligen2;
>>
>> import org.apache.commons.math.MathException;
>>
>> /**
>>  *
>>  * @author wjr
>>  */
>> public class TestNormal {
>>
>>     static org.apache.commons.math.distribution.NormalDistributionImpl nd =
>>             new
>> org.apache.commons.math.distribution.NormalDistributionImpl(10, 5);
>>
>>     public static void main(String[] args) {
>>         double[] arry = new double[100000];
>>         java.util.Random ran = new java.util.Random(1l);
>>
>>         for (int jdx = 0; jdx < 10; jdx++) {
>>             for (int idx = 0; idx < arry.length; idx++) {
>>                 try {
>>                     arry[idx] =
>> nd.inverseCumulativeProbability(ran.nextDouble());
>>                 } catch (MathException ex) {
>>                     ex.printStackTrace();
>>                 }
>>             }
>>             try {
>>                 System.out.println("ttest " +
>> org.apache.commons.math.stat.inference.TestUtils.tTest(10,arry));
>>             } catch (IllegalArgumentException ex) {
>>                 ex.printStackTrace();
>>             } catch (MathException ex) {
>>                 ex.printStackTrace();
>>             }
>>         }
>>     }
>> }
>>
>> Output:
>>
>>> run-single:
>>> ttest 0.3433300114960922
>>> ttest 0.1431930575825282
>>> ttest 0.12336027805916228
>>> ttest 0.49478850669361796
>>> ttest 0.9216887341410063
>>> ttest 0.9937228334312525
>>> ttest 0.13669784550400177
>>> ttest 0.9646134537758599
>>> ttest 0.9965741269090211
>>> ttest 0.03815948891784959
>>> BUILD SUCCESSFUL (total time: 20 seconds)
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> For additional commands, e-mail: user-help@commons.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org