Return-Path: Delivered-To: apmail-lucene-mahout-dev-archive@minotaur.apache.org Received: (qmail 54235 invoked from network); 7 Dec 2009 18:29:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Dec 2009 18:29:41 -0000 Received: (qmail 23237 invoked by uid 500); 7 Dec 2009 18:29:40 -0000 Delivered-To: apmail-lucene-mahout-dev-archive@lucene.apache.org Received: (qmail 23190 invoked by uid 500); 7 Dec 2009 18:29:40 -0000 Mailing-List: contact mahout-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-dev@lucene.apache.org Delivered-To: mailing list mahout-dev@lucene.apache.org Received: (qmail 23153 invoked by uid 99); 7 Dec 2009 18:29:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Dec 2009 18:29:40 +0000 X-ASF-Spam-Status: No, hits=-10.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Dec 2009 18:29:38 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 16435234C1EF for ; Mon, 7 Dec 2009 10:29:18 -0800 (PST) Message-ID: <833688337.1260210558085.JavaMail.jira@brutus> Date: Mon, 7 Dec 2009 18:29:18 +0000 (UTC) From: "Sean Owen (JIRA)" To: mahout-dev@lucene.apache.org Subject: [jira] Commented: (MAHOUT-212) Need random sampler for use in reducers In-Reply-To: <1825818266.1260163458097.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAHOUT-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787033#action_12787033 ] Sean Owen commented on MAHOUT-212: ---------------------------------- Yeah test injection was the idea behind using RandomUtils, since it will return a generator that uses the same seed every time when set in test mode. The unit tests do (should) set it globally as such, to make sure the results are deterministic. Yes the returned generator is a MersenneTwisterRNG which just extends Random. Yes anything for testing should probably be package-private. (I'd also suggest making the instance fields private here? not sure there's a big case for extension, at least, one that isn't perhaps better answered with explicit getters) I dont' care about the test naming convention. Once this is in place I'll put my similar Iterator next to it. > Need random sampler for use in reducers > --------------------------------------- > > Key: MAHOUT-212 > URL: https://issues.apache.org/jira/browse/MAHOUT-212 > Project: Mahout > Issue Type: Bug > Components: Utils > Affects Versions: 0.2 > Reporter: Ted Dunning > Assignee: Sean Owen > Fix For: 0.3 > > Attachments: MAHOUT-212.patch > > > For a variety of mining algorithms, it helps to have a uniform way to only process a sub-set of the records in a reducer. > As such, I have written a simple generic sampler that filters an Iterator returning a fair sample of at most a specified size. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.