mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Mahout SSVD is too slow for highly dimensional data
Date Mon, 10 Jun 2013 18:28:34 GMT
what is requested rank? This guy will not scale w.r.t rank, only w.r.t
input size. Reallistically you don't need k>100, p >15.

What is the input size (A in Gb?)


On Mon, Jun 10, 2013 at 5:31 AM, Yahia Zakaria <yahiawestlife@gmail.com>wrote:

> Hi All
>
> I am running Mahout SSVD (trunk version) using pca option on Bag of Words
> dataset (http://archive.ics.uci.edu/ml/datasets/Bag+of+Words). This
> dataset
> have 8000000 instances (rows) and 100000 attributes (columns). Mahout SSVD
> is too slow, it may take days to finish the first phase of SSVD (Q-Job) . I
> am running the code on a cluster of 16 machines, each one is 8 cores and 32
> GB memory. Moreover, the CPU and memory of the workers are not utilized at
> all. While running Mahout SSVD on smaller dataset (12500 rows and 5000
> columns), it runs too fast, the job was finished in 2 minutes. Do you have
> any idea why Mahout SSVD is too slow for high dimensional data ? and to
> what extent that SSVD can work efficiently (with respect to the number of
> rows and columns of the input matrix) ?
>
> Thanks
> Yehia
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message