Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 22094 invoked from network); 23 May 2010 14:47:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 May 2010 14:47:54 -0000 Received: (qmail 11203 invoked by uid 500); 23 May 2010 14:47:54 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 11169 invoked by uid 500); 23 May 2010 14:47:53 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 11161 invoked by uid 99); 23 May 2010 14:47:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 May 2010 14:47:53 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.208.4.195] (HELO mout.perfora.net) (74.208.4.195) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 May 2010 14:47:46 +0000 Received: from jeff-eastmans-macbook-pro.local (c-71-198-0-148.hsd1.ca.comcast.net [71.198.0.148]) by mrelay.perfora.net (node=mrus1) with ESMTP (Nemesis) id 0Mclar-1NyOSY2eyf-00IE9g; Sun, 23 May 2010 10:47:24 -0400 Message-ID: <4BF93FFB.8030502@windwardsolutions.com> Date: Sun, 23 May 2010 07:47:23 -0700 From: Jeff Eastman User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: user@mahout.apache.org Subject: Re: Mahout LDA Parameter: maxIter References: <4BF7FB4C.1050003@windwardsolutions.com> In-Reply-To: Content-Type: multipart/mixed; boundary="------------040104020608040708070009" X-Provags-ID: V01U2FsdGVkX1/DnKnNPb2xpGxt6/Ww/zxhGvo5AqlDBvgvIGA T/OwSv1aIXd5Frr54WwH1ld7xAeJqI+2OmT3CO8MHQW22rhF4R qe4tUqdWDtzFTznvscdkKy73BBdf5ijzOZVXSzFc+M= X-Virus-Checked: Checked by ClamAV on apache.org --------------040104020608040708070009 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Yes, but it takes another 80 iterations to get there and the results, on Reuters at least, don't seem to improve that much. On 5/22/10 5:01 PM, Robin Anil wrote: > David's rule of thumb was to let the iterations go until relative change in > LL becomes around 10^-4 > > Robin > > On Sat, May 22, 2010 at 9:12 PM, Jeff Eastmanwrote: > > >> I suggest you try running with a trunk checkout and upgrading to Hadoop >> 0.20.2. Mahout is still in motion and I've run LDA on Reuters on trunk in >> the last few days. The maxIter parameter should not be an issue; you could >> try removing it entirely and LDA will default to running to convergence >> (about 100 iterations which can take some time). I've found the Reuters >> results don't change too much after 20. Even with a clean trunk checkout >> Reuters will only use a single node and the iterations should take about 5 >> mins each. If you want to run on a multi-node cluster, install the patch in >> MAHOUT-397 ( >> >> >> https://issues.apache.org/jira/browse/MAHOUT-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel) >> and use the same arguments as in examples/bin/build-reuters.sh. Even on a >> 3-node cluster this brings the iteration time down to about a minute and a >> half which is worth doing. >> >> Hope this helps, >> Jeff >> >> http://www.windwardsolutions.com >> >> >> >> >> On 5/22/10 5:40 AM, 杨杰 wrote: >> >> >>> Hi, everyone >>> >>> I'm trying mahout now. When running LDA on reuter corpus >>> ( >>> http://lucene.grantingersoll.com/2010/02/16/trijug-intro-to-mahout-slides-and-demo-examples/ >>> ), >>> A parameter refuses to work. This parameter is "maxIter", without >>> which, i cannot decide the iteration to run~ >>> >>> My CMD is: >>> bin/mahout.hadoop lda --input mahout/seq-sparse-tf/vectors --output >>> mahout/seq-sparse-tf/lda-out5 --numWords 34000 --numTopics 20 >>> --maxIter 1 >>> >>> But got a exception: >>> 10/05/22 20:32:11 ERROR lda.LDADriver: Exception >>> org.apache.commons.cli2.OptionException: Unexpected 2 while processing >>> Options >>> at >>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:100) >>> at >>> org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:115) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at >>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) >>> at >>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) >>> at >>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172) >>> ... >>> >>> What's the problem? I'm using version 0.3& Hadoop 0.20.0. >>> >>> Thank you! >>> >>> >>> >>> >>> >> >> > --------------040104020608040708070009--