[ https://issues.apache.org/jira/browse/MAHOUT753?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=13069863#comment13069863
]
Ted Dunning commented on MAHOUT753:

Murmur hash should also be good if fed from a sequence counter multiplied by a large prime.
This is a weak form of congruential generator.
Lance, the best test for distribution would be a onesided KS test. I don't think that we
have one handy, but it is very easy to build one from spare parts. See http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
The basic idea is that you find the largest positive and negative differences of the empirical
cumulative distribution function from the theoretically desired cumulative distribution function.
The size of these errors is a good measure of how different the distributions should be.
At 1 megasample, this difference should be less than about 0.002. Commons math has a way
to compute the test statistic distribution if we really care about the details.
> MurmurHashRandom class: subclass of java.util.Random that uses MurmurHash
> 
>
> Key: MAHOUT753
> URL: https://issues.apache.org/jira/browse/MAHOUT753
> Project: Mahout
> Issue Type: New Feature
> Components: Math
> Reporter: Lance Norskog
> Assignee: Sean Owen
> Priority: Minor
> Attachments: MurmurBench.java, MurmurHashRandom.java
>
>

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
