mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Singular Value Decomposition does not return correct eigenvalues and -vectors
Date Fri, 23 Sep 2011 23:03:03 GMT
Markus-

Probably the best approach is to crosscheck your results with live data of
various sizes with the R statistical system.  (You will often get results
with opposing signs.)

Lance

On Fri, Sep 23, 2011 at 3:42 PM, Markus Holtermann <info@markusholtermann.eu
> wrote:

> Thank you for all your responses.
>
> ref. Dan Brickley:
> ------------------
> hopefully you did dream ;-)
>
> ref. Dmitriy Lyubimov:
> ----------------------
> When I run `mahout ssvd -i A.seq -o A-ssvd/ -k 3 -p 0` I get an
> IllegalArgumentException. You can find the traceback at
> http://paste.pocoo.org/show/481168/ .
>
> ref. Ted Dunning:
> -----------------
> I am running the M/R version of SVD in local mode. I didn't install
> Hadoop except what is coming via `mvn install`.
> If I understand the code correctly, the `--inMemory` argument is only
> relevant for the "EigenVerificationJob" -- I didn't run that.
>
> Here are the latest results for the calculations as described in my
> previous mail:
>
> For 1:
> Key class: class org.apache.hadoop.io.IntWritable
> Value Class: class org.apache.mahout.math.VectorWritable
> Key: 0: Value: eigenVector0, eigenvalue = 11.344411508600611:
> {0:0.8940505788976013,1:0.05761556873901637,2:-0.44424543735613486}
> Key: 1: Value: eigenVector1, eigenvalue = 0.0:
> {0:-0.3030457633656634,1:0.8081220356417685,2:-0.5050762722761053}
> Key: 2: Value: eigenVector2, eigenvalue = -0.4362482432944815:
> {0:0.3299042704770375,1:0.5861904313011974,2:0.7399621277956934}
> Count: 3
>
> For 2:
> Key class: class org.apache.hadoop.io.IntWritable
> Value Class: class org.apache.mahout.math.VectorWritable
> Key: 0: Value: eigenVector0, eigenvalue = 11.344814282762082:
> {0:0.7369762290995766,1:0.3279852776056837,2:-0.5910090485061045}
> Key: 1: Value: eigenVector1, eigenvalue = 0.17091518882717976:
> {0:0.9225878132457447,1:0.3812202473600341,2:0.05918487858557608}
> Key: 2: Value: eigenVector2, eigenvalue = 0.0:
> {0:-0.5910090485061055,1:0.7369762290995774,2:-0.3279852776056802}
> Key: 3: Value: eigenVector3, eigenvalue =
>
> -0.5157294715892533:{0:-0.32798527760568197,1:-0.5910090485061036,2:-0.7369762290995783}
> Count: 4
>
> For 3:
> Key class: class org.apache.hadoop.io.IntWritable
> Value Class: class org.apache.mahout.math.VectorWritable
> Key: 0: Value: eigenVector0, eigenvalue = 11.344814080004587:
> {0:0.2870124314018251,1:-0.8054865010309287,2:0.5184740696291035}
> Key: 1: Value: eigenVector1, eigenvalue = 0.4852290375835231:
> {0:0.9000472484774761,1:0.041469409433508436,2:-0.4338147514658307}
> Key: 2: Value: eigenVector2, eigenvalue = 0.0:
> {0:0.3279311127797073,1:0.5911613863727806,2:0.7368781449689461}
> Count: 3
>
> For 4:
> Key class: class org.apache.hadoop.io.IntWritable
> Value Class: class org.apache.mahout.math.VectorWritable
> Key: 0: Value: eigenVector0, eigenvalue = 11.34481428276208:
> {0:0.788451139115581,1:0.5058848349238699,2:0.3498933194866569}
> Key: 1: Value: eigenVector1, eigenvalue = 0.5157294715892401:
> {0:-0.5910090485061453,1:0.7369762290995597,2:-0.32798527760564816}
> Key: 2: Value: eigenVector2, eigenvalue = 0.1709151888272022:
> {0:-0.7369762290995447,1:-0.3279852776057236,2:0.5910090485061223}
> Key: 3: Value: eigenVector3, eigenvalue = 0.0:
> {0:-0.3279852776056819,1:-0.5910090485061036,2:-0.7369762290995783}
> Count: 4
>
> For 5:
> Key class: class org.apache.hadoop.io.IntWritable
> Value Class: class org.apache.mahout.math.VectorWritable
> Key: 0: Value: eigenVector0, eigenvalue = 7.7949818262315:
> {0:-0.3998289016610171,1:0.3486764982772797,2:0.8476800982361441}
> Key: 1: Value: eigenVector1, eigenvalue = 0.0:
> {0:0.3244428422615253,1:-0.8111071056538125,2:0.4866642633922878}
> Key: 2: Value: eigenVector2, eigenvalue = -2.2686660367578133:
> {0:0.8572477421969729,1:0.4696061783100697,2:0.21117846905213422}
> Count: 3
>
> For 6:
> Key class: class org.apache.hadoop.io.IntWritable
> Value Class: class org.apache.mahout.math.VectorWritable
> Key: 0: Value: eigenVector0, eigenvalue = 9.903422603237882:
> {0:-0.305869782876591,1:-0.012493432384138303,2:0.9519913813004245}
> Key: 1: Value: eigenVector1, eigenvalue = 6.002722238353203:
> {0:-0.7781330995244824,1:0.06366543541563939,2:0.624864458709054}
> Key: 2: Value: eigenVector2, eigenvalue = 0.0:
> {0:0.2988138112963618,1:0.9481291552697455,2:0.10845003967736172}
> Key: 3: Value: eigenVector3, eigenvalue = -3.906144841591079:
> {0:0.9039656974142156,1:-0.3176397630567398,2:0.2862708487144453}
> Count: 4
>
> For 7:
> Key class: class org.apache.hadoop.io.IntWritable
> Value Class: class org.apache.mahout.math.VectorWritable
> Key: 0: Value: eigenVector0, eigenvalue = 7.04924152040162:
> {0:-0.4082482904638631,1:0.8164965809277261,2:-0.4082482904638631}
> Key: 1: Value: eigenVector1, eigenvalue = 3.782617346103868:
> {0:0.7808892910047764,1:0.08072916428282848,2:-0.6194309624391194}
> Key: 2: Value: eigenVector2, eigenvalue = 0.0:
> {0:0.47280571964327067,1:0.5716783495703939,2:0.6705509794975171}
> Count: 3
>
> For 8:
> Key class: class org.apache.hadoop.io.IntWritable
> Value Class: class org.apache.mahout.math.VectorWritable
> Key: 0: Value: eigenVector0, eigenvalue = 7.964450219004663:
> {0:NaN,1:NaN,2:NaN}
> Key: 1: Value: eigenVector1, eigenvalue = 7.000000000000002:
> {0:NaN,1:NaN,2:NaN}
> Key: 2: Value: eigenVector2, eigenvalue = 0.753347668076679:
> {0:NaN,1:NaN,2:NaN}
> Key: 3: Value: eigenVector3, eigenvalue = 0.0:
> {0:NaN,1:NaN,2:NaN}
> Count: 4
>
>
> ref. Danny Bickson:
> -------------------
> Thanks for your confirmation on how to use the rank.
> Regarding the scale factor and orthogonalization: Yes, I take it into
> account. I'm running SVD from trunk without any changes. And even after
> commenting out those parts of the code, the results are still wrong in
> the cases 1, 2, 3, 7 and 8
>
> Thank you for your help.
>
> Markus
>
>
> > On 22 Sep 2011, at 18:37, Markus Holtermann
> > <info@markusholtermann.eu> wrote:
> >
> >> Hello there,
> >>
> >> I'm trying to run Mahout's Singular Value Decomposition but
> >> realized, that the resulting eigenvalues are wrong in most cases.
> >> So I took two small 3x3 matrices and calculated their eigenvalues
> >> and eigenvectors by hand and compared the results to Mahout.
> >>
> >> Only in one of eight cases the results for Mahout and my pen &
> >> paper matched.
> >>
> >> Lets take A = {{1,2,3},{2,4,5},{3,5,6}} and B =
> >> {{5,2,4},{-3,6,2},{3,-3,1}}
> >>
> >> As you can see, A is symmetric, B is not.
> >>
> >> I ran `mahout svd --output out/ --numRows 3 --numCols 3` eight
> >> times with different arguments:
> >>
> >> 1) --input A --rank 3 --symmetric true    result is wrong 2)
> >> --input A --rank 4 --symmetric true    result is wrong 3) --input
> >> A --rank 3 --symmetric false   result is wrong 4) --input A --rank
> >> 4 --symmetric false   result is CORRECT
> >>
> >> 5) --input B --rank 3 --symmetric true    result is wrong 6)
> >> --input B --rank 4 --symmetric true    result is wrong 7) --input
> >> B --rank 3 --symmetric false   result is wrong 8) --input B --rank
> >> 4 --symmetric false   result is wrong
> >>
> >> To verify that my input data is correct, this is the result of
> >> `mahout seqdumper`
> >>
> >> For A: Key class: class org.apache.hadoop.io.IntWritable Value
> >> Class: class org.apache.mahout.math.VectorWritable Key: 0: Value:
> >> {0:1.0,1:2.0,2:3.0} Key: 1: Value: {0:2.0,1:4.0,2:5.0} Key: 2:
> >> Value: {0:3.0,1:5.0,2:6.0} Count: 3
> >>
> >>
> >> For B: Key class: class org.apache.hadoop.io.IntWritable Value
> >> Class: class org.apache.mahout.math.VectorWritable Key: 0: Value:
> >> {0:5.0,1:2.0,2:4.0} Key: 1: Value: {0:-3.0,1:6.0,2:2.0} Key: 2:
> >> Value: {0:3.0,1:-3.0,2:1.0} Count: 3
> >>
> >>
> >> And finally, the correct eigenvalues should be: For A: λ1 = 11.3448
> >> λ2 = -0.515729 λ3 = 0.170915
> >>
> >> For B: λ1 = 7 λ2 = 3 λ3 = 2
> >>
> >> So, are there any known bugs in Mahout's SVD implementation? Am I
> >> doing something wrong? Is this algorithm known to produce wrong
> >> results?
> >>
> >> Thanks in advance.
> >>
> >> Markus
>
>


-- 
Lance Norskog
goksron@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message