mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: Mahout LDA Parameter: maxIter
Date Sun, 23 May 2010 14:47:23 GMT
Yes, but it takes another 80 iterations to get there and the results, on 
Reuters at least, don't seem to improve that much.

On 5/22/10 5:01 PM, Robin Anil wrote:
> David's rule of thumb was to let the iterations go until relative change in
> LL becomes around 10^-4
>
> Robin
>
> On Sat, May 22, 2010 at 9:12 PM, Jeff Eastman<jdog@windwardsolutions.com>wrote:
>
>    
>> I suggest you try running with a trunk checkout and upgrading to Hadoop
>> 0.20.2. Mahout is still in motion and I've run LDA on Reuters on trunk in
>> the last few days. The maxIter parameter should not be an issue; you could
>> try removing it entirely and LDA will default to running to convergence
>> (about 100 iterations which can take some time). I've found the Reuters
>> results don't change too much after 20. Even with a clean trunk checkout
>> Reuters will only use a single node and the iterations should take about 5
>> mins each. If you want to run on a multi-node cluster, install the patch in
>> MAHOUT-397 (
>>
>>
>> https://issues.apache.org/jira/browse/MAHOUT-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel)
>> and use the same arguments as in examples/bin/build-reuters.sh. Even on a
>> 3-node cluster this brings the iteration time down to about a minute and a
>> half which is worth doing.
>>
>> Hope this helps,
>> Jeff
>>
>> http://www.windwardsolutions.com
>>
>>
>>
>>
>> On 5/22/10 5:40 AM, 杨杰 wrote:
>>
>>      
>>> Hi, everyone
>>>
>>> I'm trying mahout now. When running LDA on reuter corpus
>>> (
>>> http://lucene.grantingersoll.com/2010/02/16/trijug-intro-to-mahout-slides-and-demo-examples/
>>> ),
>>> A parameter refuses to work. This parameter is "maxIter", without
>>> which, i cannot decide the iteration to run~
>>>
>>> My CMD is:
>>> bin/mahout.hadoop lda --input mahout/seq-sparse-tf/vectors --output
>>> mahout/seq-sparse-tf/lda-out5 --numWords 34000 --numTopics 20
>>> --maxIter 1
>>>
>>> But got a exception:
>>> 10/05/22 20:32:11 ERROR lda.LDADriver: Exception
>>> org.apache.commons.cli2.OptionException: Unexpected 2 while processing
>>> Options
>>>         at
>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:100)
>>>         at
>>> org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:115)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>         at
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>         at
>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>         at
>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172)
>>> ...
>>>
>>> What's the problem? I'm using version 0.3&   Hadoop 0.20.0.
>>>
>>> Thank you!
>>>
>>>
>>>
>>>
>>>        
>>
>>      
>    


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message