mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: SVD usage
Date Mon, 28 Jun 2010 20:39:55 GMT
Hey Avishay,

  Attached files are stripped from apache mailing list postings, so I didn't
see your CSVtoSeq.java, but given the error, I'll bet you a million to one
the cause of the error is that in constructing
your SequentialAccessSparseVector instances from CSV format, you use the
constructor which does not specify what the cardinality of the vector is
going to be.  This causes the default cardinality (Integer.MAX_VALUE = 2^31
- 1) to be used.

  Make sure that you know what cardinality your vectors should be at
construction time, and use the constructor which sets this value properly
(or alternately, copy the values from one vector with default cardinality to
a new vector with the correct cardinality once you know it.  This latter
idea can be very helpful if you want to build up your vector as a
RandomAccessSparseVector [these have fast mutation rates, as they are
map-based], and then "seal" them into immutable SequentialAccessSparseVector
instances at the end.  The problem with this latter part is that we don't
currently have a "copy" constructor which takes both a specified cardinality
and another vector.  It's about a 2 line patch to add this, and it's a good
idea to do so, for exactly this kind of case...).

  Let me know if you find this was or was not the problem.

  -jake

On Mon, Jun 28, 2010 at 10:54 AM, Avishay Livne1 <AVISHAYL@il.ibm.com>wrote:

>
>
> Hi,
>
> I'm trying to use Mahout's SVD with no success so far.
> I converted my input  from CSV format using the attached class.
> Then I run the following command
> hadoop jar mahout-examples-0.3.job
> org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver
> -i /hdfs/data/svd/user_doc_score -o  /hdfs/data/svd/svd-output -r 10 -nr
> 6040 -nc 3282 -sym 0
>
> and get this error:
> org.apache.mahout.math.CardinalityException: My cardinality is: 2147483647,
> but the other is: 3282
>        at org.apache.mahout.math.RandomAccessSparseVector.dot
> (RandomAccessSparseVector.java:275)
>        at org.apache.mahout.math.hadoop.TimesSquaredJob
> $TimesSquaredMapper.scale(TimesSquaredJob.java:200)
>        at org.apache.mahout.math.hadoop.TimesSquaredJob
> $TimesSquaredMapper.map(TimesSquaredJob.java:191)
>        at org.apache.mahout.math.hadoop.TimesSquaredJob
> $TimesSquaredMapper.map(TimesSquaredJob.java:147)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> Any ideas/suggestions?
>
> Thanks,
> Avishay
>
> (See attached file: CSVtoSeq.java)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message