mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lu <>
Subject LDA on single node is much faster than 20 nodes
Date Tue, 06 Sep 2011 07:35:27 GMT

I am running LDA on 18k documents, each document has 5k terms. total 
300k terms. Topics is set to 100.

Running LDA on Hadoop single node configuration takes about 5 hours per 
stage. And 20 stages would take 100 hours.

However, given 20 machines, running on Amazon EMR is actually much much 
slower. It takes 1000 minutes per stage. (It takes about 10 minutes for 
1% mapping progress.) Reducing is much faster is counted in seconds, 
almost neglect-able.

Does anyone has similar experience or my setup is wrong?


View raw message