mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1440) Add option to set the RNG seed for inital cluster generation in Kmeans/fKmeans
Date Fri, 18 Apr 2014 11:39:15 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973994#comment-13973994
] 

Hudson commented on MAHOUT-1440:
--------------------------------

SUCCESS: Integrated in Mahout-Quality #2576 (See [https://builds.apache.org/job/Mahout-Quality/2576/])
MAHOUT-1440 Add option to set the RNG seed for inital cluster generation in Kmeans/fKmeans
(ssc: rev 1588439)
* /mahout/trunk/CHANGELOG
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/RandomSeedGenerator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/common/commandline/DefaultOptionCreator.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestRandomSeedGenerator.java
* /mahout/trunk/src/conf/fkmeans.props
* /mahout/trunk/src/conf/kmeans.props


> Add option to set the RNG seed for inital cluster generation in Kmeans/fKmeans
> ------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1440
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1440
>             Project: Mahout
>          Issue Type: Improvement
>          Components: CLI, Clustering
>    Affects Versions: 1.0
>            Reporter: Andrew Palumbo
>            Assignee: Sebastian Schelter
>            Priority: Minor
>              Labels: reproducibility
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1440.patch
>
>
> It was noted recently that there should be a way to set a static seed for the the initial
clusters of Kmeans. In the interests of reproducibility and benchmarking, this patch adds
an option to set the seed in the RNG used in the RandomSeedGenerator.buildRandom() method
called from the KmeansDriver and FuzzyKMeansDriver.  
> I've added in a CLI option -setRandomSeed that when set to the same value (with the -k
option set) will produce reproducible results from kmeans and fkmeans.
> This patch allows the user to set a value.  It may make more sense to just have an option
to set a flag to use the STANDARD_SEED from RandomWrapper.
> I am still feeling my way around the codebase so if this will be useful and there need
to be any changes let me know.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message