mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Lanczos SVD scalability
Date Tue, 05 Jul 2011 08:11:54 GMT
Lanczos is probably dominated by overhead and startup costs on such a small
matrix.  You only have 100,000 non-zreo elements which is a truly tiny
problem.  Stochastic projection SVD, for instance would compute the answer
for such a problem in a few milliseconds.

You need a much larger problem to show parallel gain.  Try 100 x 10^6
non-zeros or more.

On Mon, Jul 4, 2011 at 11:27 PM, agnonchik <gluhoff@inm.ras.ru> wrote:

> What could be the reason of a poor Lanczos SVD scalability on cluster? I
> don't observe any speed-up at all increasing the number of nodes. What am I
> doing wrong?
>
> I'm processing a 10000x1000 matrix with 1% non-zeros. The elapsed CPU time
> scales like this:
> 1 slave node - 89m39.399s
> 2 slave nodes - 93m47.435s
> 8 slave nodes - 89m20.821s
>
> I checked the output, cleanEigenvectors - they are mathematically correct.
>
> Cluster specs:
> Intel Core2 Duo E7200 @ 2.53 GHz CPUs
> Gigabit Ethernet
> each node has 80GB hard drive
>
> I saved the matrix in the sequential format to HDFS. Should I save it in
> another format to be processed in parallel?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lanczos-SVD-scalability-tp3139790p3139790.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message