mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-753) MurmurHashRandom class: subclass of java.util.Random that uses MurmurHash
Date Sat, 23 Jul 2011 00:39:09 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069863#comment-13069863
] 

Ted Dunning commented on MAHOUT-753:
------------------------------------

Murmur hash should also be good if fed from a sequence counter multiplied by a large prime.
 This is a weak form of congruential generator.

Lance, the best test for distribution would be a one-sided KS test.  I don't think that we
have one handy, but it is very easy to build one from spare parts.  See http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

The basic idea is that you find the largest positive and negative differences of the empirical
cumulative distribution function from the theoretically desired cumulative distribution function.
 The size of these errors is a good measure of how different the distributions should be.
 At 1 mega-sample, this difference should be less than about 0.002.  Commons math has a way
to compute the test statistic distribution if we really care about the details.

> MurmurHashRandom class: subclass of java.util.Random that uses MurmurHash
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-753
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-753
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>            Reporter: Lance Norskog
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MurmurBench.java, MurmurHashRandom.java
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message