Thanks for the reply.
First, the reason for using NormalDistributionImpl is that I'm
translating from FORTRAN and I wasn't thinking a whole lot. Using Random
makes sense.
Second, I'm not really caring, at least not yet, about how normal my
output samples are. What I do really care about is that the means of my
generated samples match the observed means. For example, if I start with
an observed max temperature of 30, I want to determine that 30 is within
the confidence interval of the sample at a 90% level. What I think that
I am getting is, using the same technique, half the time I am hitting my
goal and the other half my samples stink. If my understanding is
correct, I should be getting a 90% confidence level 9 times out of 10,
more or less, which clearly isn't happening.
wjr
Phil Steitz wrote:
> On 9/17/07, William J Rust <wjr@weru.ksu.edu> wrote:
>> I'm working on a climate simulation program that takes monthly averages
>> and generates daily readings that are assumed to be normally
>> distributed. The following program creates 10 sets of 100,000 random
>> deviates with mean 10 and SD 5. It then applies a t test (results below)
>> to ensure that the generated numbers are good enough. As the results
>> show, they aren't. I'm wondering a) I am doing something wrong or b) is
>> there something wrong with the stats routines?
>
> There are a couple of problems here. First, while your inversion
> method should generate approximately normally distributed values, it
> is better to use the JDKsupplied method for this (much faster and a
> better algorithm). There is a wrapped version of this provided in
> org.apache.commons.math.random.RandomDataImpl. To use that:
>
> import org.apache.commons.math.random.RandomData;
> import org.apache.commons.math.random.RandomDataImpl;
> RandomData randomData = new RandomDataImpl();
> ...
> arry[idx] = randomData.nextGaussian(10, 5);
>
> Second, I don't understand what you are expecting from the ttest.
> TestUtils.tTest(mu, array) returns the pvalue associated with a
> twotailed test with the null hypothesis that the values in the array
> come from a distribution with mean = mu. So small pvalues, say less
> than .01, would indicate that the mean appears to differ significantly
> from 10. This should happen roughly one in every 100 times.
> Differences as large as what you observed on your first run should
> happen about 34 out of every 100 times, etc. The values reported
> below do not look surprising to me. They do not support rejecting the
> null hypothesis that the mean is what it is supposed to be, which is a
> good thing.
>
> To test normality of the deviates, you should apply a normality test
> to the deviates themselves, e.g. a KolmogorovSmirnov test. Commons
> math does not currently include normality tests (patches welcome :).
> To do this, you would need to dump the generated arrays to a file and
> then do the test with R or some other package that includes normality
> tests.
>
> Unless I am missing something, I don't think a ttest is going to give
> you the information that you need to verify that the generated values
> are normally distributed. Another thing that you could do is to
> examine the empirical distribution of the generated values  lay a
> grid over the range and count how many fall into each range and
> compare these counts to what you would expect under the hypothesis of
> normality (essentially what the KS test does). You can use
> org.apache.commons.random.EmpircalDistribution to bin the generated
> data and get bin counts.
>
> If you do find that normality tests fail on the generated values using
> either your inversion method or the RandomDataImpl.nextGaussian
> method, please open a Jira ticket
> (http://commons.apache.org/math/issuetracking.html) including the R
> script or output from the package that you used for testing. Thanks!
>
> hth,
>
> Phil
>
>
>> Thanks,
>>
>> wjr
>>
>> package usda.weru.cligen2;
>>
>> import org.apache.commons.math.MathException;
>>
>> /**
>> *
>> * @author wjr
>> */
>> public class TestNormal {
>>
>> static org.apache.commons.math.distribution.NormalDistributionImpl nd =
>> new
>> org.apache.commons.math.distribution.NormalDistributionImpl(10, 5);
>>
>> public static void main(String[] args) {
>> double[] arry = new double[100000];
>> java.util.Random ran = new java.util.Random(1l);
>>
>> for (int jdx = 0; jdx < 10; jdx++) {
>> for (int idx = 0; idx < arry.length; idx++) {
>> try {
>> arry[idx] =
>> nd.inverseCumulativeProbability(ran.nextDouble());
>> } catch (MathException ex) {
>> ex.printStackTrace();
>> }
>> }
>> try {
>> System.out.println("ttest " +
>> org.apache.commons.math.stat.inference.TestUtils.tTest(10,arry));
>> } catch (IllegalArgumentException ex) {
>> ex.printStackTrace();
>> } catch (MathException ex) {
>> ex.printStackTrace();
>> }
>> }
>> }
>> }
>>
>> Output:
>>
>>> runsingle:
>>> ttest 0.3433300114960922
>>> ttest 0.1431930575825282
>>> ttest 0.12336027805916228
>>> ttest 0.49478850669361796
>>> ttest 0.9216887341410063
>>> ttest 0.9937228334312525
>>> ttest 0.13669784550400177
>>> ttest 0.9646134537758599
>>> ttest 0.9965741269090211
>>> ttest 0.03815948891784959
>>> BUILD SUCCESSFUL (total time: 20 seconds)
>>
>>
>> 
>> To unsubscribe, email: userunsubscribe@commons.apache.org
>> For additional commands, email: userhelp@commons.apache.org
>>
>>

To unsubscribe, email: userunsubscribe@commons.apache.org
For additional commands, email: userhelp@commons.apache.org
