mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: LDA runtimes
Date Wed, 23 Sep 2009 11:01:11 GMT

On Sep 23, 2009, at 6:05 AM, Levy, Mark wrote:

> I've started to experiment with LDA and am finding that it creates  
> only
> a single long-running map task for each iteration, which doesn't scale
> well.  The map is taking 20mins for 10k of my input SparseVectors,  
> and 5
> hours for 100k (the vocabulary size also grows when there are more
> vectors).
>
> Is this expected or am I doing something wrong?  Are there any  
> existing
> performance benchmarks?
>

That's pretty new code, so I doubt there is much for benchmarks.  If  
you can share your vectors (the serialized ones, not the originals  
with text) than we can profile and look into it a bit more.

Also, you may want to look at MAHOUT-165 in JIRA, as there are some  
performance improvements for sparse vector using primitives.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message